mirror of
https://github.com/ItsDrike/itsdrike.com.git
synced 2024-11-14 15:47:17 +00:00
Compare commits
No commits in common. "532da8ef8678a6ed7bf8fb37c559c6515750708c" and "903123081b1c74f4e097e1f9d30dc02c73b9bebe" have entirely different histories.
532da8ef86
...
903123081b
2
.github/workflows/build-publish.yml
vendored
2
.github/workflows/build-publish.yml
vendored
|
@ -31,7 +31,7 @@ jobs:
|
|||
- name: Setup hugo
|
||||
uses: peaceiris/actions-hugo@v2
|
||||
with:
|
||||
hugo-version: "0.125.7"
|
||||
hugo-version: "0.120.1"
|
||||
extended: true
|
||||
|
||||
# Will use the build.sh script to build the page using hugo.
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
---
|
||||
title: Managing (multiple) git credentials
|
||||
date: 2022-07-27
|
||||
lastmod: 2024-06-05
|
||||
tags: [programming, git]
|
||||
sources:
|
||||
- <https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git>
|
||||
|
@ -22,12 +21,6 @@ changelog:
|
|||
- Add note about disabling commit signing
|
||||
- Add alternative command for copying on wayland
|
||||
- Fix typos and text wrapping
|
||||
2024-06-05:
|
||||
- Improve path matching explanation/note on git credentials helper, detailing the pitfalls it has with multiple
|
||||
accounts on the same git hosting platform.
|
||||
- Wording improvements
|
||||
- Rewrite tackling multiple accounts section
|
||||
- Include bash aliases example
|
||||
---
|
||||
|
||||
Many people often find initially setting up their git user a bit unclear, especially when it comes to managing multiple
|
||||
|
@ -120,7 +113,7 @@ your account (for example they may only allow you to pull/push code, but not to
|
|||
|
||||
{{< notice warning >}}
|
||||
This method stores your credentials in the project's git config file in `.git/config`. Since this is a simple URL to
|
||||
one of the projects remotes, it will just be stored in this config file in **plaintext** without any form of encryption.
|
||||
one of the proejcts remotes, it will just be stored in this config file in **plaintext** without any form of encryption.
|
||||
|
||||
Bear this in mind when giving someone access to the project directory, your credentials will be present in that
|
||||
directory!
|
||||
|
@ -150,24 +143,19 @@ Alternatively, we can directly edit the global git configuration:
|
|||
|
||||
Each credential context is defined by a URL. This context will then be used to look up specific configuration. For
|
||||
example if we're accessing `https://github.com/ItsDrike/itsdrike.com`, git looks into the config file to see if a
|
||||
section matches this context.
|
||||
|
||||
Git will consider the two a match, if the context matches on both the protocols (`http`/`https`), and then on the host
|
||||
portion (`github.com`/`gitlab.com`/...). It can also optionally check the paths too, if they are present
|
||||
(`/ItsDrike/itsdrike.com`)
|
||||
section matches this context. It will consider the two a match, if the context matches on both the protocols
|
||||
(`http`/`https`), and then on the host portion (`github.com`/`gitlab.com`/...). It can also optionally check the paths
|
||||
too, if they are present (`/ItsDrike/itsdrike.com`)
|
||||
|
||||
{{< notice note >}}
|
||||
Git matches the paths exactly (if they're included). That means a configured credential like `https://github.com/user1`
|
||||
will NOT match an origin of `https://github.com/user1/repo.git`, so you would need a credential entry for each repository.
|
||||
Git matches the hosts directly, without considering if they come from the same domain, so if subdomain differs, it will
|
||||
not register as a match. For example, for context of `https://gitlab.work_company.com/user/repo.git`, it wouldn't match
|
||||
a configuration section for `https://work_company.com`, since `work_company.com != gitlab.work_company.com`.
|
||||
|
||||
This may not be an issue, if you're trying to use this with multiple accounts on different git host platforms (like on
|
||||
`github.com` and `gitlab.com`), where you could just leave the credential to match only on the host, and not include
|
||||
any path. However, if you're trying to use multiple accounts with the same host, it will not work very well.
|
||||
|
||||
Similarly, git matches the hosts directly too, without considering if they come from the same domain, so if subdomain
|
||||
differs, it will not register as a match either. This means that for origin like:
|
||||
`https://gitlab.work_company.com/user/repo.git`, git wouldn't match a configuration credential section for
|
||||
`https://work_company.com`, since `work_company.com != gitlab.work_company.com`.
|
||||
The paths are also matched exactly (if they're included), so for the example context from above, we would not get a
|
||||
match on a config section with `https://gitlab.work_company.com/user`, only on
|
||||
`https://gitlab.work_company.com/user/repo.git` (in addition to the config entry without path
|
||||
`https://gitlab.work_company.com`).
|
||||
{{< /notice >}}
|
||||
|
||||
This does sound like a great option for multi-account usage, however the issue with this approach is that these
|
||||
|
@ -179,8 +167,8 @@ The username will be stored in git's global config file in **plaintext**, making
|
|||
worried about leaking your **username** (not password) for the git hosting provider.
|
||||
|
||||
If you're using the global configuration, this generally shouldn't be a big concern, since the username won't actually
|
||||
be in the project file unlike with the remote-urls. However, if you share a machine with multiple people, you may want
|
||||
to consider securing your global configuration file (`~/.config/git/config`) using your file system's permission
|
||||
be in the project file unlike with the remote-urls. However if you share a machine with multiple people, you may want
|
||||
to consider securing your global configuration file (`~/.config/git/config`) using your filesystem's permission
|
||||
controls to prevent others from reading it.
|
||||
|
||||
If you're defining contexts in local project's config though, you should be aware that the username will be present in
|
||||
|
@ -218,21 +206,16 @@ git config --global credentials.helper 'store --file=/full/path/to/git-credentia
|
|||
Once the helper is configured, you will first still get asked for your username and password, and only after that first
|
||||
time you enter them will the get cached into this credentials file.
|
||||
|
||||
{{< notice note >}}
|
||||
This has the same matching pitfalls as credential contexts defined in settings, the URL paths are matched exactly, and
|
||||
so are URL hosts.
|
||||
{{< /notice >}}
|
||||
|
||||
{{< notice info >}}
|
||||
|
||||
The credentials file will cache the data in this format:
|
||||
|
||||
```txt
|
||||
https://<USERNAME>:<PASSWORD>@github.com
|
||||
https://<USERNAME2>:<PASSWORD>@gitlab.com
|
||||
```
|
||||
|
||||
Which is indeed a **plaintext** format, however the file will be protected with your file system permissions, and
|
||||
access should be limited to you (as the user who owns the file). And since this file should live somewhere outside
|
||||
access should be limited to you (as the user who owns the file). And since this file should live somewhere outside of
|
||||
the project's directory, the project can be safely shared with others without worrying about leakage.
|
||||
{{< /notice >}}
|
||||
|
||||
|
@ -262,7 +245,7 @@ helpers](https://git-scm.com/docs/gitcredentials#_custom_helpers). These allow u
|
|||
management by delegating to 3rd party applications and services.
|
||||
|
||||
A commonly used external credential helper is for example the [Git Credential Manager
|
||||
(GCM)](https://github.com/GitCredentialManager/git-credential-manager). GCM can even handle things like 2-factor
|
||||
(GCM)](https://github.com/GitCredentialManager/git-credential-manager). GCM can even handle things like 2 factor
|
||||
authentication, or using OAuth2.
|
||||
|
||||
If you want to, you can even write your own custom credential helper to handle your exact needs, in which case I'd
|
||||
|
@ -274,8 +257,8 @@ some examples of a basic custom provider.
|
|||
|
||||
Most modern git servers also provide a way to access their repositories using SSH keys rather than username and
|
||||
password over HTTPS. This approach is significantly better, since guessing SSH keys is generally much harder, and they
|
||||
can easily be revoked. They also generally aren't anywhere near as powerful as full user passwords, so even if they are
|
||||
compromised, the attacker would only have limited access.
|
||||
can easily be revoked. They also generally aren't nowhere near as powerful as full user passwords, so even if they are
|
||||
compromised, the attacker would only have a limited access.
|
||||
|
||||
SSH uses public-private key pair, which means you will need to give out the public key over to the git hosting
|
||||
platform, and keep the private part on your machine for authentication. Using the public key, the server will then be
|
||||
|
@ -288,8 +271,7 @@ proxies, making communication with the remote server impossible.
|
|||
#### Generating an SSH key
|
||||
|
||||
To generate an SSH key, you can use `ssh-keygen` command line utility. Generating keys should always be done
|
||||
independently of the git hosting provider. The git hosting provider shouldn't need to see your private key at any
|
||||
point!
|
||||
independently from the git hosting provider, since they don't shouldn't need to see your private key at any point.
|
||||
|
||||
The command for this key generation looks like this:
|
||||
|
||||
|
@ -300,7 +282,7 @@ ssh-keygen -t ed25519 -C "<COMMENT>"
|
|||
- The `-C` flag allows you to specify a comment, which you can use to specify what this key will be used for. If you
|
||||
don't need a comment, you can also omit this flag.
|
||||
- The `-t` flag specifies the key type. The default type for SSH keys is `rsa`, however I'd suggest using `ed25519`
|
||||
which is considered safer and more performant than RSA keys. If you decide to use `rsa`, make sure to use a
|
||||
which is considered safer and more performant than RSA keys. If you will decide to use `rsa`, make sure to use a
|
||||
key size of at least 2048 bits, but for better security, but ideally you should try to use a key size of `4096`.
|
||||
|
||||
After running this command, you will be asked to specify a file where this key should be stored. You will probably want
|
||||
|
@ -310,12 +292,12 @@ you can have all of your git ssh keys grouped together and separated from SSH ke
|
|||
{{< notice info >}}
|
||||
Make sure to add the `~/.ssh` (or `C:\Users\your_username\.ssh` for Windows) prefix to your filename, so the key is
|
||||
correctly added to the `.ssh` folder. You should keep your keys in this folder, since it is already protected by the
|
||||
file system from reading by other users.
|
||||
filesystem from reading by other users.
|
||||
{{< /notice >}}
|
||||
|
||||
Once you select a file name, you will be asked to set a passphrase. You can opt to leave this empty by pressing enter
|
||||
without entering anything. Going with a passphrase protected key is safer, however it will also mean you will need to
|
||||
type your password each time, which may be annoying. However, there is a way to cache this passphrase with SSH agent,
|
||||
type your password each time, which may be annoying. However there is a way to cache this passphrase with SSH agent,
|
||||
which you can read more about in the [GitHub's
|
||||
docs](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#adding-your-ssh-key-to-the-ssh-agent).
|
||||
Using passphrase is significantly better for your system's security, since it means that even if the private key got
|
||||
|
@ -326,7 +308,7 @@ extension.
|
|||
|
||||
#### Add public key to your hosting provider's account
|
||||
|
||||
Now that you've created a public and private SSH key pair, you will need to let your git hosting provider know about it.
|
||||
Now that you've create a public and private SSH key pair, you will need to let your git hosting provider know about it.
|
||||
It is important that you only give the public key (file with `.pub` extension) to your provider, and not your private
|
||||
key.
|
||||
|
||||
|
@ -371,9 +353,9 @@ ssh -Tvvv git@github.com -i ~/.ssh/id_ed25519
|
|||
|
||||
#### SSH Configuration file
|
||||
|
||||
To meaningfully use your key, you'll want to register some specific host name for your key, so you won't need to use
|
||||
the `-i` flag. You can do this by editing (or creating) `~/.ssh/config` file (or `C:\Users\your_username\.ssh\config`
|
||||
for Windows).
|
||||
To meaningfully use your key, you'll want register some specific host name for your key, so you won't need to use the
|
||||
`-i` flag. You can do this by editing (or creating) `~/.ssh/config` file (or `C:\Users\your_username\.ssh\config` for
|
||||
Windows).
|
||||
|
||||
An example configuration file with multiple git accounts:
|
||||
|
||||
|
@ -398,12 +380,10 @@ HOST work.github.com
|
|||
```
|
||||
|
||||
When you have multiple accounts with the same `HostName` (same git hosting provider), you will need to specify a unique
|
||||
`HOST` name. One way which I like to do is to make it appear as a subdomain, like `work.github.com`. Another common way
|
||||
to define these, is to use a dash at the end, like: `github.com-user2`. FINALLY, We've cracked the issue of storing
|
||||
multiple credentials even on the same git hosting platform.
|
||||
`Host` name.
|
||||
|
||||
Let's first make sure this configuration works though. To do that, you can run another test command, but this time
|
||||
without specifying the key file explicitly, as it should now be getting picked up from the settings:
|
||||
To then make sure this configuration works, you can run another test command, but this time without specifying the
|
||||
key file explicitly, as it should now be getting picked up from the settings:
|
||||
|
||||
```bash
|
||||
ssh -T git@github.com
|
||||
|
@ -417,13 +397,14 @@ ssh -T work.github.com
|
|||
|
||||
#### Using the SSH keys
|
||||
|
||||
Now let's get to actually using these keys in your repositories. Doing this can be pretty straight-forward, as it is
|
||||
very similar to the first method of handling credentials which I've talked about, being storing the credentials in the
|
||||
remote-url, but this time, instead of using the actual credentials, and therefore making the project directory unsafe
|
||||
to share, it will just contain the `HOST` name you've set in your config, without leaking any keys.
|
||||
No let's finally get to actually using these keys in your repositories. Doing this can be pretty straight-forward, as
|
||||
it is very similar to the first method of handling credentials which I've talked about, being storing the credentials
|
||||
in the remote-url. However this time, instead of using the actual credentials, and therefore making the project
|
||||
directory unsafe to share, as it contains your password in plaintext, it will actually only contain the `HOST` name
|
||||
you've set in your config, without leaking any keys.
|
||||
|
||||
The commands to set this up are therefore very similar, except that instead of
|
||||
`https://<USERNAME>:<PASSWORD>@github.com/<PATH>`, we now use `git@HOST/<PATH>`:
|
||||
The commands to set this up are very similar, however instead of `https://<USERNAME>:<PASSWORD>@github.com`, we now use
|
||||
`git@HOST`:
|
||||
|
||||
```bash
|
||||
# While clonning:
|
||||
|
@ -441,11 +422,11 @@ to remember the username or the password, instead you just need to know the host
|
|||
|
||||
## Which method to use for credentials
|
||||
|
||||
Generally, using SSH keys is the safest and probably the best approach, but it can also be a bit annoying since it
|
||||
requires you to specify the SSH host for each repository in its remote url. For that reason, the approach that I would
|
||||
recommend is using git's file credential helper system to store your credentials instead.
|
||||
Generally, using SSH keys is the safest approach, but it can also be a bit annoying since it requires you to specify
|
||||
the SSH host for each repository in it's remote url. For that reason, the approach that I would recommend is using
|
||||
git's credential helper system to store your credentials instead.
|
||||
|
||||
However, if you will go with this method, make sure that you're using a personal access token instead of the actual
|
||||
However if you will go with this method, make sure that you're using a personal access token instead of the actual
|
||||
account's password, to limit the permissions an attacker would gain in case your credentials were leaked.
|
||||
|
||||
If your git hosting platform doesn't provide access tokens, this method becomes a lot more dangerous to use, since if
|
||||
|
@ -454,51 +435,51 @@ account on that git host platform. That's why in that case, you should really co
|
|||
it's a bit less convenient, as they can be easily revoked and only allow limited access, just like personal access
|
||||
tokens.
|
||||
|
||||
From what we've seen up until for now, the store credential helper method is only good if you only have a single account
|
||||
per git hosting platform though; so... what if you have multiple accounts?
|
||||
## Tackling credentials for multiple accounts
|
||||
|
||||
### Combine both store credential helper & SSH keys
|
||||
### Credentials for differing hosts
|
||||
|
||||
The simple solution to this issue is to just use both SSH and store credential helpers. That way, you can just clone
|
||||
with regular unmodified URLs, letting the store credential helper figure out which credential to use on a per-platform
|
||||
basis. Leaving the alt accounts you have on a single platform to SSH, where your store credential helper only knows
|
||||
about your primary account, and you have SSH config entries set for each of your alt accounts.
|
||||
When it comes to managing multiple accounts, this gets a bit more tricky. But if each of your accounts lives on a
|
||||
different domain/host, you can still use credential helpers without any issues, since it can handle multiple
|
||||
credentials for multiple websites out of the box. If you're using the file credential helper, this would result in the
|
||||
`git-credentials` file looking like this:
|
||||
|
||||
Whenever you then want to use an alt account for a repo, instead of cloning with the regular URL, you will clone with
|
||||
the SSH url for your alt account.
|
||||
```txt
|
||||
https://<USERNAME>:<PASSWORD>@github.com
|
||||
https://<USERNAME2>:<PASSWORD2>@gitlab.com
|
||||
```
|
||||
|
||||
This method is pretty much perfect, and there's not many downsides to it. However, I will also show you some other
|
||||
interesting methods, which you might like more if you don't want to mess around with SSH keys, or if your git hosting
|
||||
provider doesn't support them. You might also just not like the idea of having to change the remote URL path of your
|
||||
repository to this special path with the SSH host, which the other solutions will avoid.
|
||||
With that, whenever you'd try to pull/push with the remote url, git will go through this file in order, searching for
|
||||
the first matching host. So for example when using a remote url belonging to `github.com` domain, the first line would
|
||||
apply, while if your remote url belongs to `gitlab.com`, the second line would apply. This means that if your accounts
|
||||
are from different providers, you can avoid the hassle of doing anything more complicated.
|
||||
|
||||
However if you have more accounts on a single host, you will need to somehow let git know what to do.
|
||||
|
||||
### Using credential contexts
|
||||
|
||||
Remember the credential contexts we were defining? (The URLs that git could match against to figure out which
|
||||
username to use.) Well, we can actually use these, but set the username to be used for the platform in local
|
||||
configuration. To do that, you can just run:
|
||||
The good news is that even with same domains, you can actually still use the git credentials as your default method,
|
||||
and use git credential contexts to find a username. With that, even if you're using the same host, git will know to
|
||||
look for a specific username in the credentials file now, which should be sufficient distinction to match any amount of
|
||||
different credentials.
|
||||
|
||||
However the issue with git contexts is that they need to match the path component exactly, so even though you can
|
||||
configure git to use different contexts for different repositories in your global config, you can't configure it to use
|
||||
a certain context for a partial match on path, so you'd need to specify each repository which should use custom
|
||||
credentials into your global git configuration, which is not great.
|
||||
|
||||
Instead, you should use the local git configuration of each project and specify a git context with the username you
|
||||
want to use for that project. That way, you won't need to keep config for every non-default project in your global
|
||||
config, and yet still use the same file credential helper to store all of your credentials in a single place.
|
||||
|
||||
```bash
|
||||
git config --local credential."https://github.com".username <USERNAME>
|
||||
git config --local credential.https://github.com.username <USERNAME>
|
||||
```
|
||||
|
||||
This will mean git will now know what username should be used for the given remote url. With that, our store
|
||||
credentials helper can now be a bit smarter, and instead of just picking the first entry in your `git-credentials`
|
||||
file, that matches the given remote url, it will also look for a username match. So for example, if you set the
|
||||
username in that local config to `user2`, and you had this in your `git-credentials`:
|
||||
|
||||
```
|
||||
https://user1:<PASSWORD>@github.com
|
||||
https://user2:<PASSWORD2>@github.com
|
||||
```
|
||||
|
||||
It would actually pick the 2nd record now, because of the username match. (When no username is configured, the store
|
||||
credentials helper will always pick the 1st record.)
|
||||
|
||||
{{< notice info >}}
|
||||
This will however store the credential context into the local project's git configuration (in `.git/config`), which is
|
||||
using **plaintext**, which means you might end up leaking your **username** (not password), if you give someone access
|
||||
to this project's directory.
|
||||
Once again, this will store the credential context into the local project's git configuration (in `.git/config`), which
|
||||
is using **plaintext**, which means you might end up leaking your **username** (not password), if you give someone
|
||||
access to this project's directory.
|
||||
|
||||
The actual password will however be completely safe, as it should only be present in the `git-credentials` file, which
|
||||
should be located elsewhere, and configured from the global git config. So this only affects you if you want to keep
|
||||
|
@ -506,34 +487,44 @@ your username for that git hosting provider private too. If you do, you will nee
|
|||
sharing project files, or use a different method.
|
||||
{{< /notice >}}
|
||||
|
||||
This method is fine, but in my opinion, it's a bit clunky, since you need to also specify the remote URL here, and it
|
||||
leaks your username on the platform. Because of that, I think the method below is a better option, but this method is
|
||||
still good to know about, and might be a better option for you, depending on your preferences.
|
||||
|
||||
### Using different credentials file
|
||||
|
||||
Let's try and hack our way through the problem and do everything while sticking to just the store credentials helper.
|
||||
Do you remember how when we first configured the credential helpers, we specified the path to the `git-credentials`
|
||||
file it should write the credentials to?
|
||||
|
||||
Well, we stored that value to our global config, but of course, local config will override global config, so we could
|
||||
just set a different file for the store credential helper, which contains our alt account! Doing that is a simple as
|
||||
running this command:
|
||||
The alternative to using credential contexts with your plaintext stored username would be using multiple
|
||||
`git-credentials` files, and simply overriding the credential helper system in the local config, setting a different
|
||||
file for the store credential helper. This could for example look like this:
|
||||
|
||||
```bash
|
||||
git config --local credentials.helper 'store --file=/home/user/.config/git-credentials-alt'
|
||||
git config credentials.helper 'store --file=/home/user/.config/git-credentials-work'
|
||||
```
|
||||
|
||||
Security-wise, this method is pretty good too, since your credentials will be kept outside the project in the referenced git
|
||||
credential file, which should be secured by the file system's permissions to prevent reads from other users. When done
|
||||
properly, this won't even leak your usernames, just make sure not to include the username as a part of the file name.
|
||||
(That is, if you care about not leaking your username)
|
||||
With this approach, you can have your credentials kept in multiple separate credential files, and just mention the path
|
||||
to the file you need for each project.
|
||||
|
||||
Security-wise, this method is better because your username will be kept outside of the project in the referenced git
|
||||
credential file, which should be secured by the file system's permissions to prevent reads from other users. However
|
||||
practicality-wise, it may be a bit more inconvenient to type and even to remember the path to each credential file.
|
||||
|
||||
### SSH keys instead
|
||||
|
||||
The thing you may have noticed about all of these methods is that you'll generally need to do some extra work for all
|
||||
repositories that require non-default credentials. So even though relying on git's file credential helper is convenient
|
||||
for the default case, extending it to non-default cases will always require doing some extra configuration.
|
||||
|
||||
This extra configuration is inevitable, which is why I'd suggest going with SSH keys instead, which are pretty much
|
||||
equally as annoying, requiring you to do something extra for each non-default project (specifying them in the remote
|
||||
URL). However as I've already explained, they're pretty much the most secure way to handle credentials. So instead of
|
||||
doing some extra work just to configure a less secure method, you might as well do an equal amount of work and
|
||||
use the more secure way with SSH keys.
|
||||
|
||||
The only disadvantage to this method is then the use of non-standard ports, which some networks might end up blocking,
|
||||
making connection to the server [*pretty much*]({{< ref "posts/escaping-isolated-network#port-22-is-blocked" >}})
|
||||
unreachable from those networks.
|
||||
|
||||
## Make convenience aliases
|
||||
|
||||
Like I've already mentioned, if you work with different accounts a lot, you will certainly want to make convenience
|
||||
aliases to hide all the account switching logic away. You can do this in the form of git aliases, or bash aliases,
|
||||
by putting this to your `~/.config/git/config`:
|
||||
If you really dislike the idea of all of this repetition, I'd suggest making short-hands for whichever method you
|
||||
ended up picking, in the form of git aliases (you can also use shell aliases though). Git supports defining aliases
|
||||
through it's configuration file, where you can use the `[alias]` section for them.
|
||||
|
||||
```toml
|
||||
[alias]
|
||||
|
@ -542,7 +533,7 @@ work-clone="!sh -c 'git clone git@work.github.com:$1'"
|
|||
# Make current repository use the work git credentials file
|
||||
make-work="config --local credentials.helper 'store --file=/path/to/work/credentials'"
|
||||
# Set the username for credentials to your work account, so it can find it in default git credentials
|
||||
use-work-uname="config --local credential.'https://github.com'.username my-work-username"
|
||||
use-work-uname="config --local credential.https://github.com.username my-work-username"
|
||||
```
|
||||
|
||||
To then use these aliases, you can simply execute them as you would any other git command:
|
||||
|
@ -552,24 +543,3 @@ git work-clone ItsDrike/itsdrike.com
|
|||
git make-work
|
||||
git user-work-uname
|
||||
```
|
||||
|
||||
What I like to do is to define a bash function, which will not only set the appropriate credentials, but also a
|
||||
different local committer name and email, with the commands shown at the beginning. That could then look like this:
|
||||
|
||||
```bash
|
||||
git-work() {
|
||||
git config --local user.email "john_doe@work.com"
|
||||
git config --local user.name "John Doe"
|
||||
git config --local user.signingkey 4F3C14B2C3AE9246
|
||||
git config --local credential."https://github.com.username" johndoe_work
|
||||
}
|
||||
|
||||
git-alt() {
|
||||
git config --local user.email "pseudonym@example.com"
|
||||
git config --local user.name "pseudonym"
|
||||
git config --local user.signingkey 522DC4E2A20A92B8
|
||||
git config --local credential."https://github.com.username" jogndoe_2
|
||||
}
|
||||
```
|
||||
|
||||
While leaving my primary account defined in my global git configuration.
|
||||
|
|
|
@ -1,11 +1,7 @@
|
|||
---
|
||||
title: Interpreted vs Compiled Languages
|
||||
date: 2021-09-09
|
||||
lastmod: 2024-06-05
|
||||
tags: [programming]
|
||||
changelog:
|
||||
2024-06-05:
|
||||
- Improve wording
|
||||
---
|
||||
|
||||
You've likely seen or heard someone talking about a language as being interpreted or as being a compiled language, but
|
||||
|
@ -37,46 +33,44 @@ Obviously, we wanted to make things easier for ourselves by automating and abstr
|
|||
programming process as we could, so that the programmer could really only focus on the algorithm itself and the actual
|
||||
conversion of the code to a machine code that the CPU can deal with should simply happen in the background.
|
||||
|
||||
But how could we achieve something like this? The simple answer is to write something that will be able to take our
|
||||
But how could we achieve something like this? The simple answer is, to write something that will be able to take our
|
||||
code and convert it into a set of instructions that the CPU will be running. This intermediate piece of code can either
|
||||
be a compiler, or an interpreter.
|
||||
|
||||
After a piece of software like this was first written, suddenly everything became much easier. Initially we were using
|
||||
assembly languages that were very similar to the machine code, but they looked a bit more readable, giving actual names
|
||||
to the individual CPU instructions (opcodes), and allowed defining symbols (constants) and markers for code positions.
|
||||
So, while the programmer still had to think in the terms of what instruction should the CPU get in order to get it to
|
||||
do the required thing, the actual process of writing this code was a lot simpler, as you didn't have to constantly look
|
||||
at a table just to find the number of the opcode you wanted to use, and instead, you just wrote something like `LDA
|
||||
$50` (load the value at the memory address 0x50 into the accumulator register), instead of `0C50`, assuming `0C` was
|
||||
the byte representing `LDA` opcode.
|
||||
|
||||
Since converting this assembly code into a machine code was done automatically. The programmer just took the text of
|
||||
the program in the assembly language, fed it into the compiler and it returned the actual machine code, which could
|
||||
then be understood by the CPU.
|
||||
After a piece of software like this was first written, suddenly everything became much easier, initially we were using
|
||||
assembly languages that were very similar to the machine code, but they looked a bit more readable, while the
|
||||
programmer still had to think in the terms of what instruction should the CPU get in order to get it to do the required
|
||||
thing, the actual process of converting this code into a machine code was done automatically. The programmer just took
|
||||
the text of the program in the assembly language, fed it into the computer and it returned a whole machine code.
|
||||
|
||||
Later on we went deeper and deeper and eventually got to a very famous language called C. This language was incredibly
|
||||
versatile and allowed the programmers to finally start thinking in a bit more natural way, one with named variables,
|
||||
functions, loops, and a bunch of other helpful abstractions. All the while the tedious logic of converting this textual
|
||||
C implementation into something executable was left for the compiler to deal with.
|
||||
versatile and allowed the programmers to finally start thinking in a bit more natural way about how should the program
|
||||
be written and the tedious logic of converting this textual C implementation into something executable was left to the
|
||||
compiler to deal with.
|
||||
|
||||
### Recap
|
||||
|
||||
So we now know that initially, we didn't have neither the compiler nor an interpreted and we simply wrote things in
|
||||
So we now now that initially, we didn't have neither the compiler nor an interpreted and we simply wrote things in
|
||||
machine code. But this was very tedious and time taking, so we wrote a piece of software (in machine code) that was
|
||||
able to take our more high-level (english-like) text and convert that into this executable machine code.
|
||||
|
||||
## Compiled languages
|
||||
|
||||
All code that's written with a language that's supposed to be compiled has a piece of software called the "compiler".
|
||||
This piece of software is what carries the internal logic of converting these instructions that followed some
|
||||
language-specific syntax rules into the actual machine code, giving that us back an executable program.
|
||||
This piece of software is what carries the internal logic of how to convert these instructions that followed some
|
||||
language-specific syntax rules into the actual machine code, giving that us back the actual machine code.
|
||||
|
||||
However this executable version will be specific to certain CPU architecture. This is because each architecture has
|
||||
their own set of instructions, with different opcodes, registers, etc. So if someone with a different architecture were
|
||||
to obtain it, they still wouldn't be able to run it, simply because the CPU wouldn't understand those machine code
|
||||
instructions.
|
||||
This means that once we write our code in a compiled language, we can use the compiler to get an executable version of
|
||||
our program which we can then distribute to others.
|
||||
|
||||
Some of the most famous compiled languages are: C, C++, Rust, Zig
|
||||
However this executable version will be specific to certain CPU architecture and if someone with a different
|
||||
architecture were to obtain it, he wouldn't be able to run it. (At least not without emulation, which is a process of
|
||||
simulating a different CPU architecture and running the corresponding instruction that the simulated CPU gets with an
|
||||
equivalent instructions on the other architecture, even though this is a possibility, the process of emulation causes
|
||||
significant slowdowns, and because we only got the executable machine code rather than the actual source-code, we can't
|
||||
re-compile the program our-selves so that it would run natively for our architecture)
|
||||
|
||||
Some of the most famous compiled languages are: C, C++, Rust
|
||||
|
||||
## Interpreted Languages
|
||||
|
||||
|
@ -85,102 +79,53 @@ a major difference between these 2 implementations.
|
|||
|
||||
With interpreted languages, rather than feeding the whole source code into a compiler and getting a machine code out of
|
||||
it that we can run directly, we instead feed the code into an interpreter, which is a piece of software that's already
|
||||
compiled (or is also interpreted and other interpreter handles it - _interpreterception_) and this interpreter scans
|
||||
the code and goes line by line and interprets each function/instruction. We can think of it as a huge switch statement
|
||||
with all of the possible instructions of the interpreted language defined in it. Once we hit some instruction, the code
|
||||
from inside of that switch statement's case is executed.
|
||||
compiled (or is also interpreted and other interpreter handles it) and this interpreter scans the code and goes line by
|
||||
line and interprets each function/instruction that's than ran from within the interpreter. We can think of it as a huge
|
||||
switch statement with all of the possible instructions in an interpreted language defined in it. Once we hit some
|
||||
instruction, the code from inside of that switch statement is executed.
|
||||
|
||||
This means that with an interpreted language, we don't have any final result that is an executable file, which could be
|
||||
distributed on it's own, but rather we simply ship the code itself. To run it, the client is then expected to
|
||||
install the interpreter program, compiled for their machine, run it, and feed it the code we shipped.
|
||||
This means that with an interpreted language, we don't have any final result that is an executable file that can be
|
||||
distributed alone, but rather we simply ship the code itself. However this brings it's own problems such as the fact
|
||||
that each machine that would want to run such code have to have the interpreter installed, in order to "interpret" the
|
||||
instructions written in that language. We also sometimes call these "scripting languages"
|
||||
|
||||
Some of the most famous interpreted languages are: PHP, JavaScript
|
||||
|
||||
{{< notice tip >}}
|
||||
Remember how I mentioned that when a program (written in a compiled language) is compiled, it will only be possible to
|
||||
run it on the architecture it was compiled for? Well, that's not necessarily entirely correct.
|
||||
|
||||
It is actually possible to run a program compiled for a different CPU architecture, by using **emulation**.
|
||||
|
||||
Emulation is a process of literally simulating a different CPU. It is a program which takes in the machine instructions
|
||||
as it's input, and processes those instructions as if it was a CPU, setting the appropriate internal registers (often
|
||||
represented as variables), keeping track of memory, and a whole bunch of other things. This program is then compiled
|
||||
for your CPU architecture, so that it can run on your machine. This is what's called an **Emulator**.
|
||||
|
||||
With an emulator, we can simply feed it the compiled program for the CPU it emulates, and it will do exactly what a
|
||||
real CPU would do, running this program.
|
||||
|
||||
That said, emulators are usually very slow, as they're programs which run on a real CPU, having to keep track of the
|
||||
registers, memory, and a bunch of other things inside of itself, rather than inside the actual physical CPU we're
|
||||
running on, as our CPU might not even have such a register/opcode, so it needs to execute a bunch of native
|
||||
instructions to execute just the single foreign instruction.
|
||||
|
||||
Notice that an emulator is actually an interpreter, making compiled machine code for another CPU it's interpreted
|
||||
language!
|
||||
{{< /notice >}}
|
||||
|
||||
## Compilation for operating systems
|
||||
|
||||
So far, we only talked about compiled languages that output machine code, specific to some CPU architecture. However in
|
||||
vast majority of cases, that's not actually what the compiler will output anymore (at least not entirely).
|
||||
|
||||
Nowadays, we usually don't make programs that run on their own. Instead, we're working under a specific operating
|
||||
system, which then potentially handles a whole bunch of programs running inside of it. All of these operating systems
|
||||
contain something called a "kernel", which is the core part of that system, which contains a bunch of so called
|
||||
"syscalls".
|
||||
|
||||
Syscalls are basically small functons that the kernel exposes for us. These are things like opening and reading a file,
|
||||
creating a network socket, etc. These syscalls are incredibly useful, as the kernel contains the logic (drivers) for a
|
||||
whole bunch of different hardware devices (such as network cards, audio speakers/microphones, screens, keyboards, ...),
|
||||
and the syscalls it exposes are an abstraction, that gives us the same kind of interface for each device of a certain
|
||||
type (i.e. every speaker will be able to output a tone at some frequency), which we can utilize, without having to care
|
||||
about exactly how that specific device works (different speakers might need different voltages sent to them to produce
|
||||
the requested frequency).
|
||||
|
||||
For this reason, programs running under an OS will take advantage of this, and instead of outputting pure machine code,
|
||||
they output an executable file, in a OS-specific format (such as an .exe on Windows, or an ELF file on Linux). The
|
||||
instructions in this file will then also contain a special "SYSCALL" instruction, which the kernel will respond to and
|
||||
run the appropriate function.
|
||||
|
||||
This however makes the outputted executable not only CPU architecture dependant, but also OS dependant, making it even
|
||||
less portable across various platforms.
|
||||
|
||||
## Core differences
|
||||
|
||||
Compiled languages will have almost always have speed benefit to them, because they don't need additional program to
|
||||
interpret the instruction within that language when being ran, instead, this program is ran by the programmer only
|
||||
once, producing an executable that can run on it's own.
|
||||
As mentioned, with a compiled language, the source code is private and we can simply only distribute the compiled
|
||||
version, whereas with an interpreted language this is completely impossible since we need the source code because
|
||||
that's what's actually being ran by the interpreter, instruction after instruction. The best we can do if we really
|
||||
wanted to hide the source-code with an interpreted language is to obfuscate it.
|
||||
|
||||
Compilers often also perform certain optimizations, for example, if they find code that would always result in a same
|
||||
thing, like say: `a = 10 * 120`, we could do this calculation in the compiler, and only store the result `1200`,
|
||||
into the final program, making the run-time faster.
|
||||
Compiled languages will also have a speed benefit to them, because they don't need additional program to interpret the
|
||||
instruction within that language, but rather needs to go through an additional step of identifying the instructions and
|
||||
running the code for them. Compilers often also perform certain optimizations, for example with code that would always
|
||||
result in a same thing, something like `a = 10 * 120`, we could compile it and only store the result `1200`, running
|
||||
the actual equation at compile time, making the running time faster.
|
||||
|
||||
Yet another advantage of compiled languages is that the original source code can be kept private, since we can simply
|
||||
only distribute the pre-compiled binaries. At most, people can look at the resulting machine code, which is however
|
||||
very hard to understand. In comparison, an interpreted language needs the interpreter to read the code to run it, which
|
||||
means when distributing, we would have to provide the full source-code. The best we can do if we really wanted to hide
|
||||
what the code is doing is to obfuscate it.
|
||||
So far it would look like compiled languages are a lot better than interpreted, but they do have many disadvantages to
|
||||
them as well. One of which I've already mention, that is not being cross-platform. Once a program is compiled, it will
|
||||
only ever run on the platform it was compiled for. If the compilation was happening directly to a machine code, this
|
||||
would mean it would be architecture-specific.
|
||||
|
||||
So far it would look like compiled languages are a lot better than interpreted, but they do have a significant
|
||||
disadvantage to them as well. One of which that I've already mentioned is not being cross-platform. Once a program is
|
||||
compiled, it will only be runnable on the platform it was compiled for. That is, not only on the same CPU architecture,
|
||||
but also only on the same operating system, meaning we'll need to be compiling for every os on every architecture.
|
||||
But we usually don't do this and rather compile for something kernel-specific, because we are running under some
|
||||
specific operating system that uses some kernel. The kernel is basically acting as a big interpreter for every single
|
||||
program. We do this because this means that we can implement some security measures that for example disallow untrusted
|
||||
programs to read or write to a memory location that doesn't belong to that program.
|
||||
|
||||
The process of compiling for all of these various platforms might not be easy, as cross-compiling (the process of
|
||||
compiling for a program for different CPU architecture than that which you compile on) is often hard to set up, or even
|
||||
impossible because the tooling simply isn't available on your platform. So you may need to actually get a machine
|
||||
running on the platform you want to compile for, and do so directly on it, which is very tedious.
|
||||
This alone is a pretty big disadvantage, because we will need to compile our program for every operating system it is
|
||||
expected to be ran on. In the case of an interpreted language, all we need to do is have the actual interpreter to be
|
||||
executable on all platforms, but the individual programs made with that language can then be ran on any platform, as
|
||||
long as it has the interpreter.
|
||||
|
||||
However with an interpreted language, the same code will run on any platform, as long as the interpreter itself is
|
||||
available (compiled) for that platform. This means rather than having to distribute dozens of versions for every single
|
||||
platform, it would be enough to ship out the source code itself, and it will run (almost) anywhere.
|
||||
From this example, we can also see another reason why we may want to use an interpreter, that is the kernel itself,
|
||||
with it we can implement these security measures and somewhat restrict parts of what can be done, this is very crucial
|
||||
to having a secure operating system.
|
||||
|
||||
Interpreted languages are also usually a bit easier to write, as they can afford to be a bit more dynamic. For example,
|
||||
in C, we need to know exactly how big a number can get, and choose the appropriate number type (int, long, long long,
|
||||
short, not to mention all of these can be signed/unsigned), so that the compiler can work with this information and do
|
||||
some optimizations based on it, however in an interpreted language, a number can often grow dynamically, sort of like a
|
||||
vector, taking up more memory as needed. (It would be possible to achieve the same in a compiled language, but it would
|
||||
be at an expense of a bunch of optimizations that the compiler wouldn't be able to do anymore, so it's usually not done).
|
||||
Another advantage of interpreted languages is the simple fact that they don't need to be compiled, it's one less step
|
||||
in the process of distributing the application and it also means that it's much easier to write automated tests for,
|
||||
and for debugging in general.
|
||||
|
||||
## Hybrid languages
|
||||
|
||||
|
@ -193,41 +138,9 @@ done up-front, that's a bit inflexible, or the interpreted model, where all work
|
|||
bit slower, we kind of combine things and do both.
|
||||
|
||||
Up-front, we compile it partially, to what's known as byte-code, or intermediate language. This takes things as close
|
||||
to being compiled as we can, while still being portable across many platforms. We can then make some optimizations to
|
||||
this byte-code, just like a regular compiler might, though usually this won't be nowhere near as powerful, because the
|
||||
byte-code is still pretty abstract.
|
||||
|
||||
Once we get our byte-code (optimized, or not), there are 2 options of what happens next:
|
||||
|
||||
### Byte code interpreter
|
||||
|
||||
The first option is that the language has an interpreter program, which takes in this byte-code, and runs it from that.
|
||||
If this is the case, a program in such language could be distributed as this byte-code, instead of as pure source-code,
|
||||
as a way to keep the source-code private. While this byte-code will be easier to understand than pure machine code if
|
||||
someone were to attempt to reverse-engineer it, it is still a better option than having to ship the real source-code.
|
||||
|
||||
This therefore combines the advantages of an interpreted language, of being able to run anywhere, with those of a
|
||||
compiled language, of not having to ship the plaintext source-code, and of doing some optimizations, to minimize the
|
||||
run-time of the code.
|
||||
|
||||
Examples of languages like these are: Python, ...
|
||||
to being compiled as we can, while still being portable across many platforms, and we then distribute this byte-code
|
||||
rather than the full source-code and each person who runs it does the last step of taking it to machine code by running
|
||||
this byte-code with an interpreter. This is also known as Just In Time (JIT) compilation.
|
||||
|
||||
Most languages tend to only be one or the other, but there are a fair few that follow this hybrid implementation, most
|
||||
notably, these are: Java, C#, Python, VB.NET
|
||||
|
||||
### Byte code in compiled languages
|
||||
|
||||
As a second option,
|
||||
|
||||
The approach of generating byte-code isn't unique to hybrid languages though. Even pure compiled languages actually
|
||||
often generate byte-code first, and then let the compiler compile this byte-code, rather than going directly from
|
||||
source code to machine code.
|
||||
|
||||
This is actually very beneficial, because it means multiple vastly different languages, like C/C++ and Rust can end up
|
||||
being first compiled into the same kind of byte-code, which is then fed into yet another compiler, a great example of
|
||||
this is LLVM, which then finally compiles it into the machine code. But many languages have their own JIT.
|
||||
|
||||
The reason languages often like to use this approach is that they can rely on a well established compiler, like LLVM,
|
||||
to do a bunch of complex optimizations of their code, or compilation for different architectures, without having to
|
||||
write out their own logic to do so. Instead, all they have to write is a much smaller compiler that turns their
|
||||
language into LLVM compatible byte-code, and let it deal with optimizing the rest.
|
||||
|
|
|
@ -10,7 +10,7 @@
|
|||
<div class="item-list-group">
|
||||
<ul class="item-list-items">
|
||||
{{ range .Pages.ByTitle }}
|
||||
<li class="item-list-item" data-id="{{ with .File}}{{ .UniqueID }}{{ end }}">
|
||||
<li class="item-list-item" data-id="{{ with .File}}{{ .File.UniqueID }}{{ end }}">
|
||||
{{ partial "list_item.html" (dict "context" . "dateformat" "Jan 02, 2006") }}
|
||||
</li>
|
||||
{{ end }}
|
||||
|
|
|
@ -11,7 +11,7 @@
|
|||
|
||||
{{ if .Params.tags }}
|
||||
<span class="pr-2" title="Tags">
|
||||
<a href="{{ ref . "/tags" }}" class="no-color-change mr-1">
|
||||
<a href="{{ ref . " /tags" }}" class="no-color-change mr-1">
|
||||
<i class="fas fa-tags small content-detail"></i>
|
||||
</a>
|
||||
{{ range $i, $e := .Params.tags }}
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
<nav class="navbar navbar-expand-sm navbar-dark">
|
||||
<!-- Navbar home -->
|
||||
<a class="navbar-brand" href="{{ ref . "/" }}">
|
||||
<a class="navbar-brand" href="{{ ref . " /" }}">
|
||||
<code>/home/{{ lower .Site.Title }}</code>
|
||||
</a>
|
||||
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav"
|
||||
|
|
|
@ -25,7 +25,7 @@
|
|||
<!-- List all posts in this year group -->
|
||||
<ul class="item-list-items">
|
||||
{{ range .Pages }}
|
||||
<li class="item-list-item" data-id="{{ with .File}}{{ .UniqueID }}{{ end }}">
|
||||
<li class="item-list-item" data-id="{{ with .File}}{{ .File.UniqueID }}{{ end }}">
|
||||
{{ partial "list_item.html" (dict "context" . "dateformat" "Jan 02") }}
|
||||
</li>
|
||||
{{ end }}
|
||||
|
|
Loading…
Reference in a new issue