Compare commits

..

5 commits

7 changed files with 285 additions and 168 deletions

View file

@ -31,7 +31,7 @@ jobs:
- name: Setup hugo - name: Setup hugo
uses: peaceiris/actions-hugo@v2 uses: peaceiris/actions-hugo@v2
with: with:
hugo-version: "0.120.1" hugo-version: "0.125.7"
extended: true extended: true
# Will use the build.sh script to build the page using hugo. # Will use the build.sh script to build the page using hugo.

View file

@ -1,6 +1,7 @@
--- ---
title: Managing (multiple) git credentials title: Managing (multiple) git credentials
date: 2022-07-27 date: 2022-07-27
lastmod: 2024-06-05
tags: [programming, git] tags: [programming, git]
sources: sources:
- <https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git> - <https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git>
@ -21,6 +22,12 @@ changelog:
- Add note about disabling commit signing - Add note about disabling commit signing
- Add alternative command for copying on wayland - Add alternative command for copying on wayland
- Fix typos and text wrapping - Fix typos and text wrapping
2024-06-05:
- Improve path matching explanation/note on git credentials helper, detailing the pitfalls it has with multiple
accounts on the same git hosting platform.
- Wording improvements
- Rewrite tackling multiple accounts section
- Include bash aliases example
--- ---
Many people often find initially setting up their git user a bit unclear, especially when it comes to managing multiple Many people often find initially setting up their git user a bit unclear, especially when it comes to managing multiple
@ -113,7 +120,7 @@ your account (for example they may only allow you to pull/push code, but not to
{{< notice warning >}} {{< notice warning >}}
This method stores your credentials in the project's git config file in `.git/config`. Since this is a simple URL to This method stores your credentials in the project's git config file in `.git/config`. Since this is a simple URL to
one of the proejcts remotes, it will just be stored in this config file in **plaintext** without any form of encryption. one of the projects remotes, it will just be stored in this config file in **plaintext** without any form of encryption.
Bear this in mind when giving someone access to the project directory, your credentials will be present in that Bear this in mind when giving someone access to the project directory, your credentials will be present in that
directory! directory!
@ -143,19 +150,24 @@ Alternatively, we can directly edit the global git configuration:
Each credential context is defined by a URL. This context will then be used to look up specific configuration. For Each credential context is defined by a URL. This context will then be used to look up specific configuration. For
example if we're accessing `https://github.com/ItsDrike/itsdrike.com`, git looks into the config file to see if a example if we're accessing `https://github.com/ItsDrike/itsdrike.com`, git looks into the config file to see if a
section matches this context. It will consider the two a match, if the context matches on both the protocols section matches this context.
(`http`/`https`), and then on the host portion (`github.com`/`gitlab.com`/...). It can also optionally check the paths
too, if they are present (`/ItsDrike/itsdrike.com`) Git will consider the two a match, if the context matches on both the protocols (`http`/`https`), and then on the host
portion (`github.com`/`gitlab.com`/...). It can also optionally check the paths too, if they are present
(`/ItsDrike/itsdrike.com`)
{{< notice note >}} {{< notice note >}}
Git matches the hosts directly, without considering if they come from the same domain, so if subdomain differs, it will Git matches the paths exactly (if they're included). That means a configured credential like `https://github.com/user1`
not register as a match. For example, for context of `https://gitlab.work_company.com/user/repo.git`, it wouldn't match will NOT match an origin of `https://github.com/user1/repo.git`, so you would need a credential entry for each repository.
a configuration section for `https://work_company.com`, since `work_company.com != gitlab.work_company.com`.
The paths are also matched exactly (if they're included), so for the example context from above, we would not get a This may not be an issue, if you're trying to use this with multiple accounts on different git host platforms (like on
match on a config section with `https://gitlab.work_company.com/user`, only on `github.com` and `gitlab.com`), where you could just leave the credential to match only on the host, and not include
`https://gitlab.work_company.com/user/repo.git` (in addition to the config entry without path any path. However, if you're trying to use multiple accounts with the same host, it will not work very well.
`https://gitlab.work_company.com`).
Similarly, git matches the hosts directly too, without considering if they come from the same domain, so if subdomain
differs, it will not register as a match either. This means that for origin like:
`https://gitlab.work_company.com/user/repo.git`, git wouldn't match a configuration credential section for
`https://work_company.com`, since `work_company.com != gitlab.work_company.com`.
{{< /notice >}} {{< /notice >}}
This does sound like a great option for multi-account usage, however the issue with this approach is that these This does sound like a great option for multi-account usage, however the issue with this approach is that these
@ -167,7 +179,7 @@ The username will be stored in git's global config file in **plaintext**, making
worried about leaking your **username** (not password) for the git hosting provider. worried about leaking your **username** (not password) for the git hosting provider.
If you're using the global configuration, this generally shouldn't be a big concern, since the username won't actually If you're using the global configuration, this generally shouldn't be a big concern, since the username won't actually
be in the project file unlike with the remote-urls. However if you share a machine with multiple people, you may want be in the project file unlike with the remote-urls. However, if you share a machine with multiple people, you may want
to consider securing your global configuration file (`~/.config/git/config`) using your file system's permission to consider securing your global configuration file (`~/.config/git/config`) using your file system's permission
controls to prevent others from reading it. controls to prevent others from reading it.
@ -206,16 +218,21 @@ git config --global credentials.helper 'store --file=/full/path/to/git-credentia
Once the helper is configured, you will first still get asked for your username and password, and only after that first Once the helper is configured, you will first still get asked for your username and password, and only after that first
time you enter them will the get cached into this credentials file. time you enter them will the get cached into this credentials file.
{{< notice info >}} {{< notice note >}}
This has the same matching pitfalls as credential contexts defined in settings, the URL paths are matched exactly, and
so are URL hosts.
{{< /notice >}}
{{< notice info >}}
The credentials file will cache the data in this format: The credentials file will cache the data in this format:
```txt ```txt
https://<USERNAME>:<PASSWORD>@github.com https://<USERNAME>:<PASSWORD>@github.com
https://<USERNAME2>:<PASSWORD>@gitlab.com
``` ```
Which is indeed a **plaintext** format, however the file will be protected with your file system permissions, and Which is indeed a **plaintext** format, however the file will be protected with your file system permissions, and
access should be limited to you (as the user who owns the file). And since this file should live somewhere outside of access should be limited to you (as the user who owns the file). And since this file should live somewhere outside
the project's directory, the project can be safely shared with others without worrying about leakage. the project's directory, the project can be safely shared with others without worrying about leakage.
{{< /notice >}} {{< /notice >}}
@ -245,7 +262,7 @@ helpers](https://git-scm.com/docs/gitcredentials#_custom_helpers). These allow u
management by delegating to 3rd party applications and services. management by delegating to 3rd party applications and services.
A commonly used external credential helper is for example the [Git Credential Manager A commonly used external credential helper is for example the [Git Credential Manager
(GCM)](https://github.com/GitCredentialManager/git-credential-manager). GCM can even handle things like 2 factor (GCM)](https://github.com/GitCredentialManager/git-credential-manager). GCM can even handle things like 2-factor
authentication, or using OAuth2. authentication, or using OAuth2.
If you want to, you can even write your own custom credential helper to handle your exact needs, in which case I'd If you want to, you can even write your own custom credential helper to handle your exact needs, in which case I'd
@ -257,8 +274,8 @@ some examples of a basic custom provider.
Most modern git servers also provide a way to access their repositories using SSH keys rather than username and Most modern git servers also provide a way to access their repositories using SSH keys rather than username and
password over HTTPS. This approach is significantly better, since guessing SSH keys is generally much harder, and they password over HTTPS. This approach is significantly better, since guessing SSH keys is generally much harder, and they
can easily be revoked. They also generally aren't nowhere near as powerful as full user passwords, so even if they are can easily be revoked. They also generally aren't anywhere near as powerful as full user passwords, so even if they are
compromised, the attacker would only have a limited access. compromised, the attacker would only have limited access.
SSH uses public-private key pair, which means you will need to give out the public key over to the git hosting SSH uses public-private key pair, which means you will need to give out the public key over to the git hosting
platform, and keep the private part on your machine for authentication. Using the public key, the server will then be platform, and keep the private part on your machine for authentication. Using the public key, the server will then be
@ -271,7 +288,8 @@ proxies, making communication with the remote server impossible.
#### Generating an SSH key #### Generating an SSH key
To generate an SSH key, you can use `ssh-keygen` command line utility. Generating keys should always be done To generate an SSH key, you can use `ssh-keygen` command line utility. Generating keys should always be done
independently from the git hosting provider, since they don't shouldn't need to see your private key at any point. independently of the git hosting provider. The git hosting provider shouldn't need to see your private key at any
point!
The command for this key generation looks like this: The command for this key generation looks like this:
@ -282,7 +300,7 @@ ssh-keygen -t ed25519 -C "<COMMENT>"
- The `-C` flag allows you to specify a comment, which you can use to specify what this key will be used for. If you - The `-C` flag allows you to specify a comment, which you can use to specify what this key will be used for. If you
don't need a comment, you can also omit this flag. don't need a comment, you can also omit this flag.
- The `-t` flag specifies the key type. The default type for SSH keys is `rsa`, however I'd suggest using `ed25519` - The `-t` flag specifies the key type. The default type for SSH keys is `rsa`, however I'd suggest using `ed25519`
which is considered safer and more performant than RSA keys. If you will decide to use `rsa`, make sure to use a which is considered safer and more performant than RSA keys. If you decide to use `rsa`, make sure to use a
key size of at least 2048 bits, but for better security, but ideally you should try to use a key size of `4096`. key size of at least 2048 bits, but for better security, but ideally you should try to use a key size of `4096`.
After running this command, you will be asked to specify a file where this key should be stored. You will probably want After running this command, you will be asked to specify a file where this key should be stored. You will probably want
@ -297,7 +315,7 @@ filesystem from reading by other users.
Once you select a file name, you will be asked to set a passphrase. You can opt to leave this empty by pressing enter Once you select a file name, you will be asked to set a passphrase. You can opt to leave this empty by pressing enter
without entering anything. Going with a passphrase protected key is safer, however it will also mean you will need to without entering anything. Going with a passphrase protected key is safer, however it will also mean you will need to
type your password each time, which may be annoying. However there is a way to cache this passphrase with SSH agent, type your password each time, which may be annoying. However, there is a way to cache this passphrase with SSH agent,
which you can read more about in the [GitHub's which you can read more about in the [GitHub's
docs](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#adding-your-ssh-key-to-the-ssh-agent). docs](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#adding-your-ssh-key-to-the-ssh-agent).
Using passphrase is significantly better for your system's security, since it means that even if the private key got Using passphrase is significantly better for your system's security, since it means that even if the private key got
@ -308,7 +326,7 @@ extension.
#### Add public key to your hosting provider's account #### Add public key to your hosting provider's account
Now that you've create a public and private SSH key pair, you will need to let your git hosting provider know about it. Now that you've created a public and private SSH key pair, you will need to let your git hosting provider know about it.
It is important that you only give the public key (file with `.pub` extension) to your provider, and not your private It is important that you only give the public key (file with `.pub` extension) to your provider, and not your private
key. key.
@ -353,9 +371,9 @@ ssh -Tvvv git@github.com -i ~/.ssh/id_ed25519
#### SSH Configuration file #### SSH Configuration file
To meaningfully use your key, you'll want register some specific host name for your key, so you won't need to use the To meaningfully use your key, you'll want to register some specific host name for your key, so you won't need to use
`-i` flag. You can do this by editing (or creating) `~/.ssh/config` file (or `C:\Users\your_username\.ssh\config` for the `-i` flag. You can do this by editing (or creating) `~/.ssh/config` file (or `C:\Users\your_username\.ssh\config`
Windows). for Windows).
An example configuration file with multiple git accounts: An example configuration file with multiple git accounts:
@ -380,10 +398,12 @@ HOST work.github.com
``` ```
When you have multiple accounts with the same `HostName` (same git hosting provider), you will need to specify a unique When you have multiple accounts with the same `HostName` (same git hosting provider), you will need to specify a unique
`Host` name. `HOST` name. One way which I like to do is to make it appear as a subdomain, like `work.github.com`. Another common way
to define these, is to use a dash at the end, like: `github.com-user2`. FINALLY, We've cracked the issue of storing
multiple credentials even on the same git hosting platform.
To then make sure this configuration works, you can run another test command, but this time without specifying the Let's first make sure this configuration works though. To do that, you can run another test command, but this time
key file explicitly, as it should now be getting picked up from the settings: without specifying the key file explicitly, as it should now be getting picked up from the settings:
```bash ```bash
ssh -T git@github.com ssh -T git@github.com
@ -397,14 +417,13 @@ ssh -T work.github.com
#### Using the SSH keys #### Using the SSH keys
No let's finally get to actually using these keys in your repositories. Doing this can be pretty straight-forward, as Now let's get to actually using these keys in your repositories. Doing this can be pretty straight-forward, as it is
it is very similar to the first method of handling credentials which I've talked about, being storing the credentials very similar to the first method of handling credentials which I've talked about, being storing the credentials in the
in the remote-url. However this time, instead of using the actual credentials, and therefore making the project remote-url, but this time, instead of using the actual credentials, and therefore making the project directory unsafe
directory unsafe to share, as it contains your password in plaintext, it will actually only contain the `HOST` name to share, it will just contain the `HOST` name you've set in your config, without leaking any keys.
you've set in your config, without leaking any keys.
The commands to set this up are very similar, however instead of `https://<USERNAME>:<PASSWORD>@github.com`, we now use The commands to set this up are therefore very similar, except that instead of
`git@HOST`: `https://<USERNAME>:<PASSWORD>@github.com/<PATH>`, we now use `git@HOST/<PATH>`:
```bash ```bash
# While clonning: # While clonning:
@ -422,11 +441,11 @@ to remember the username or the password, instead you just need to know the host
## Which method to use for credentials ## Which method to use for credentials
Generally, using SSH keys is the safest approach, but it can also be a bit annoying since it requires you to specify Generally, using SSH keys is the safest and probably the best approach, but it can also be a bit annoying since it
the SSH host for each repository in it's remote url. For that reason, the approach that I would recommend is using requires you to specify the SSH host for each repository in its remote url. For that reason, the approach that I would
git's credential helper system to store your credentials instead. recommend is using git's file credential helper system to store your credentials instead.
However if you will go with this method, make sure that you're using a personal access token instead of the actual However, if you will go with this method, make sure that you're using a personal access token instead of the actual
account's password, to limit the permissions an attacker would gain in case your credentials were leaked. account's password, to limit the permissions an attacker would gain in case your credentials were leaked.
If your git hosting platform doesn't provide access tokens, this method becomes a lot more dangerous to use, since if If your git hosting platform doesn't provide access tokens, this method becomes a lot more dangerous to use, since if
@ -435,51 +454,51 @@ account on that git host platform. That's why in that case, you should really co
it's a bit less convenient, as they can be easily revoked and only allow limited access, just like personal access it's a bit less convenient, as they can be easily revoked and only allow limited access, just like personal access
tokens. tokens.
## Tackling credentials for multiple accounts From what we've seen up until for now, the store credential helper method is only good if you only have a single account
per git hosting platform though; so... what if you have multiple accounts?
### Credentials for differing hosts ### Combine both store credential helper & SSH keys
When it comes to managing multiple accounts, this gets a bit more tricky. But if each of your accounts lives on a The simple solution to this issue is to just use both SSH and store credential helpers. That way, you can just clone
different domain/host, you can still use credential helpers without any issues, since it can handle multiple with regular unmodified URLs, letting the store credential helper figure out which credential to use on a per-platform
credentials for multiple websites out of the box. If you're using the file credential helper, this would result in the basis. Leaving the alt accounts you have on a single platform to SSH, where your store credential helper only knows
`git-credentials` file looking like this: about your primary account, and you have SSH config entries set for each of your alt accounts.
```txt Whenever you then want to use an alt account for a repo, instead of cloning with the regular URL, you will clone with
https://<USERNAME>:<PASSWORD>@github.com the SSH url for your alt account.
https://<USERNAME2>:<PASSWORD2>@gitlab.com
```
With that, whenever you'd try to pull/push with the remote url, git will go through this file in order, searching for This method is pretty much perfect, and there's not many downsides to it. However, I will also show you some other
the first matching host. So for example when using a remote url belonging to `github.com` domain, the first line would interesting methods, which you might like more if you don't want to mess around with SSH keys, or if your git hosting
apply, while if your remote url belongs to `gitlab.com`, the second line would apply. This means that if your accounts provider doesn't support them. You might also just not like the idea of having to change the remote URL path of your
are from different providers, you can avoid the hassle of doing anything more complicated. repository to this special path with the SSH host, which the other solutions will avoid.
However if you have more accounts on a single host, you will need to somehow let git know what to do.
### Using credential contexts ### Using credential contexts
The good news is that even with same domains, you can actually still use the git credentials as your default method, Remember the credential contexts we were defining? (The URLs that git could match against to figure out which
and use git credential contexts to find a username. With that, even if you're using the same host, git will know to username to use.) Well, we can actually use these, but set the username to be used for the platform in local
look for a specific username in the credentials file now, which should be sufficient distinction to match any amount of configuration. To do that, you can just run:
different credentials.
However the issue with git contexts is that they need to match the path component exactly, so even though you can
configure git to use different contexts for different repositories in your global config, you can't configure it to use
a certain context for a partial match on path, so you'd need to specify each repository which should use custom
credentials into your global git configuration, which is not great.
Instead, you should use the local git configuration of each project and specify a git context with the username you
want to use for that project. That way, you won't need to keep config for every non-default project in your global
config, and yet still use the same file credential helper to store all of your credentials in a single place.
```bash ```bash
git config --local credential.https://github.com.username <USERNAME> git config --local credential."https://github.com".username <USERNAME>
``` ```
This will mean git will now know what username should be used for the given remote url. With that, our store
credentials helper can now be a bit smarter, and instead of just picking the first entry in your `git-credentials`
file, that matches the given remote url, it will also look for a username match. So for example, if you set the
username in that local config to `user2`, and you had this in your `git-credentials`:
```
https://user1:<PASSWORD>@github.com
https://user2:<PASSWORD2>@github.com
```
It would actually pick the 2nd record now, because of the username match. (When no username is configured, the store
credentials helper will always pick the 1st record.)
{{< notice info >}} {{< notice info >}}
Once again, this will store the credential context into the local project's git configuration (in `.git/config`), which This will however store the credential context into the local project's git configuration (in `.git/config`), which is
is using **plaintext**, which means you might end up leaking your **username** (not password), if you give someone using **plaintext**, which means you might end up leaking your **username** (not password), if you give someone access
access to this project's directory. to this project's directory.
The actual password will however be completely safe, as it should only be present in the `git-credentials` file, which The actual password will however be completely safe, as it should only be present in the `git-credentials` file, which
should be located elsewhere, and configured from the global git config. So this only affects you if you want to keep should be located elsewhere, and configured from the global git config. So this only affects you if you want to keep
@ -487,44 +506,34 @@ your username for that git hosting provider private too. If you do, you will nee
sharing project files, or use a different method. sharing project files, or use a different method.
{{< /notice >}} {{< /notice >}}
This method is fine, but in my opinion, it's a bit clunky, since you need to also specify the remote URL here, and it
leaks your username on the platform. Because of that, I think the method below is a better option, but this method is
still good to know about, and might be a better option for you, depending on your preferences.
### Using different credentials file ### Using different credentials file
The alternative to using credential contexts with your plaintext stored username would be using multiple Let's try and hack our way through the problem and do everything while sticking to just the store credentials helper.
`git-credentials` files, and simply overriding the credential helper system in the local config, setting a different Do you remember how when we first configured the credential helpers, we specified the path to the `git-credentials`
file for the store credential helper. This could for example look like this: file it should write the credentials to?
Well, we stored that value to our global config, but of course, local config will override global config, so we could
just set a different file for the store credential helper, which contains our alt account! Doing that is a simple as
running this command:
```bash ```bash
git config credentials.helper 'store --file=/home/user/.config/git-credentials-work' git config --local credentials.helper 'store --file=/home/user/.config/git-credentials-alt'
``` ```
With this approach, you can have your credentials kept in multiple separate credential files, and just mention the path Security-wise, this method is pretty good too, since your credentials will be kept outside the project in the referenced git
to the file you need for each project. credential file, which should be secured by the file system's permissions to prevent reads from other users. When done
properly, this won't even leak your usernames, just make sure not to include the username as a part of the file name.
Security-wise, this method is better because your username will be kept outside of the project in the referenced git (That is, if you care about not leaking your username)
credential file, which should be secured by the file system's permissions to prevent reads from other users. However
practicality-wise, it may be a bit more inconvenient to type and even to remember the path to each credential file.
### SSH keys instead
The thing you may have noticed about all of these methods is that you'll generally need to do some extra work for all
repositories that require non-default credentials. So even though relying on git's file credential helper is convenient
for the default case, extending it to non-default cases will always require doing some extra configuration.
This extra configuration is inevitable, which is why I'd suggest going with SSH keys instead, which are pretty much
equally as annoying, requiring you to do something extra for each non-default project (specifying them in the remote
URL). However as I've already explained, they're pretty much the most secure way to handle credentials. So instead of
doing some extra work just to configure a less secure method, you might as well do an equal amount of work and
use the more secure way with SSH keys.
The only disadvantage to this method is then the use of non-standard ports, which some networks might end up blocking,
making connection to the server [*pretty much*]({{< ref "posts/escaping-isolated-network#port-22-is-blocked" >}})
unreachable from those networks.
## Make convenience aliases ## Make convenience aliases
If you really dislike the idea of all of this repetition, I'd suggest making short-hands for whichever method you Like I've already mentioned, if you work with different accounts a lot, you will certainly want to make convenience
ended up picking, in the form of git aliases (you can also use shell aliases though). Git supports defining aliases aliases to hide all the account switching logic away. You can do this in the form of git aliases, or bash aliases,
through it's configuration file, where you can use the `[alias]` section for them. by putting this to your `~/.config/git/config`:
```toml ```toml
[alias] [alias]
@ -533,7 +542,7 @@ work-clone="!sh -c 'git clone git@work.github.com:$1'"
# Make current repository use the work git credentials file # Make current repository use the work git credentials file
make-work="config --local credentials.helper 'store --file=/path/to/work/credentials'" make-work="config --local credentials.helper 'store --file=/path/to/work/credentials'"
# Set the username for credentials to your work account, so it can find it in default git credentials # Set the username for credentials to your work account, so it can find it in default git credentials
use-work-uname="config --local credential.https://github.com.username my-work-username" use-work-uname="config --local credential.'https://github.com'.username my-work-username"
``` ```
To then use these aliases, you can simply execute them as you would any other git command: To then use these aliases, you can simply execute them as you would any other git command:
@ -543,3 +552,24 @@ git work-clone ItsDrike/itsdrike.com
git make-work git make-work
git user-work-uname git user-work-uname
``` ```
What I like to do is to define a bash function, which will not only set the appropriate credentials, but also a
different local committer name and email, with the commands shown at the beginning. That could then look like this:
```bash
git-work() {
git config --local user.email "john_doe@work.com"
git config --local user.name "John Doe"
git config --local user.signingkey 4F3C14B2C3AE9246
git config --local credential."https://github.com.username" johndoe_work
}
git-alt() {
git config --local user.email "pseudonym@example.com"
git config --local user.name "pseudonym"
git config --local user.signingkey 522DC4E2A20A92B8
git config --local credential."https://github.com.username" jogndoe_2
}
```
While leaving my primary account defined in my global git configuration.

View file

@ -1,7 +1,11 @@
--- ---
title: Interpreted vs Compiled Languages title: Interpreted vs Compiled Languages
date: 2021-09-09 date: 2021-09-09
lastmod: 2024-06-05
tags: [programming] tags: [programming]
changelog:
2024-06-05:
- Improve wording
--- ---
You've likely seen or heard someone talking about a language as being interpreted or as being a compiled language, but You've likely seen or heard someone talking about a language as being interpreted or as being a compiled language, but
@ -33,44 +37,46 @@ Obviously, we wanted to make things easier for ourselves by automating and abstr
programming process as we could, so that the programmer could really only focus on the algorithm itself and the actual programming process as we could, so that the programmer could really only focus on the algorithm itself and the actual
conversion of the code to a machine code that the CPU can deal with should simply happen in the background. conversion of the code to a machine code that the CPU can deal with should simply happen in the background.
But how could we achieve something like this? The simple answer is, to write something that will be able to take our But how could we achieve something like this? The simple answer is to write something that will be able to take our
code and convert it into a set of instructions that the CPU will be running. This intermediate piece of code can either code and convert it into a set of instructions that the CPU will be running. This intermediate piece of code can either
be a compiler, or an interpreter. be a compiler, or an interpreter.
After a piece of software like this was first written, suddenly everything became much easier, initially we were using After a piece of software like this was first written, suddenly everything became much easier. Initially we were using
assembly languages that were very similar to the machine code, but they looked a bit more readable, while the assembly languages that were very similar to the machine code, but they looked a bit more readable, giving actual names
programmer still had to think in the terms of what instruction should the CPU get in order to get it to do the required to the individual CPU instructions (opcodes), and allowed defining symbols (constants) and markers for code positions.
thing, the actual process of converting this code into a machine code was done automatically. The programmer just took So, while the programmer still had to think in the terms of what instruction should the CPU get in order to get it to
the text of the program in the assembly language, fed it into the computer and it returned a whole machine code. do the required thing, the actual process of writing this code was a lot simpler, as you didn't have to constantly look
at a table just to find the number of the opcode you wanted to use, and instead, you just wrote something like `LDA
$50` (load the value at the memory address 0x50 into the accumulator register), instead of `0C50`, assuming `0C` was
the byte representing `LDA` opcode.
Since converting this assembly code into a machine code was done automatically. The programmer just took the text of
the program in the assembly language, fed it into the compiler and it returned the actual machine code, which could
then be understood by the CPU.
Later on we went deeper and deeper and eventually got to a very famous language called C. This language was incredibly Later on we went deeper and deeper and eventually got to a very famous language called C. This language was incredibly
versatile and allowed the programmers to finally start thinking in a bit more natural way about how should the program versatile and allowed the programmers to finally start thinking in a bit more natural way, one with named variables,
be written and the tedious logic of converting this textual C implementation into something executable was left to the functions, loops, and a bunch of other helpful abstractions. All the while the tedious logic of converting this textual
compiler to deal with. C implementation into something executable was left for the compiler to deal with.
### Recap ### Recap
So we now now that initially, we didn't have neither the compiler nor an interpreted and we simply wrote things in So we now know that initially, we didn't have neither the compiler nor an interpreted and we simply wrote things in
machine code. But this was very tedious and time taking, so we wrote a piece of software (in machine code) that was machine code. But this was very tedious and time taking, so we wrote a piece of software (in machine code) that was
able to take our more high-level (english-like) text and convert that into this executable machine code. able to take our more high-level (english-like) text and convert that into this executable machine code.
## Compiled languages ## Compiled languages
All code that's written with a language that's supposed to be compiled has a piece of software called the "compiler". All code that's written with a language that's supposed to be compiled has a piece of software called the "compiler".
This piece of software is what carries the internal logic of how to convert these instructions that followed some This piece of software is what carries the internal logic of converting these instructions that followed some
language-specific syntax rules into the actual machine code, giving that us back the actual machine code. language-specific syntax rules into the actual machine code, giving that us back an executable program.
This means that once we write our code in a compiled language, we can use the compiler to get an executable version of However this executable version will be specific to certain CPU architecture. This is because each architecture has
our program which we can then distribute to others. their own set of instructions, with different opcodes, registers, etc. So if someone with a different architecture were
to obtain it, they still wouldn't be able to run it, simply because the CPU wouldn't understand those machine code
instructions.
However this executable version will be specific to certain CPU architecture and if someone with a different Some of the most famous compiled languages are: C, C++, Rust, Zig
architecture were to obtain it, he wouldn't be able to run it. (At least not without emulation, which is a process of
simulating a different CPU architecture and running the corresponding instruction that the simulated CPU gets with an
equivalent instructions on the other architecture, even though this is a possibility, the process of emulation causes
significant slowdowns, and because we only got the executable machine code rather than the actual source-code, we can't
re-compile the program our-selves so that it would run natively for our architecture)
Some of the most famous compiled languages are: C, C++, Rust
## Interpreted Languages ## Interpreted Languages
@ -79,53 +85,102 @@ a major difference between these 2 implementations.
With interpreted languages, rather than feeding the whole source code into a compiler and getting a machine code out of With interpreted languages, rather than feeding the whole source code into a compiler and getting a machine code out of
it that we can run directly, we instead feed the code into an interpreter, which is a piece of software that's already it that we can run directly, we instead feed the code into an interpreter, which is a piece of software that's already
compiled (or is also interpreted and other interpreter handles it) and this interpreter scans the code and goes line by compiled (or is also interpreted and other interpreter handles it - _interpreterception_) and this interpreter scans
line and interprets each function/instruction that's than ran from within the interpreter. We can think of it as a huge the code and goes line by line and interprets each function/instruction. We can think of it as a huge switch statement
switch statement with all of the possible instructions in an interpreted language defined in it. Once we hit some with all of the possible instructions of the interpreted language defined in it. Once we hit some instruction, the code
instruction, the code from inside of that switch statement is executed. from inside of that switch statement's case is executed.
This means that with an interpreted language, we don't have any final result that is an executable file that can be This means that with an interpreted language, we don't have any final result that is an executable file, which could be
distributed alone, but rather we simply ship the code itself. However this brings it's own problems such as the fact distributed on it's own, but rather we simply ship the code itself. To run it, the client is then expected to
that each machine that would want to run such code have to have the interpreter installed, in order to "interpret" the install the interpreter program, compiled for their machine, run it, and feed it the code we shipped.
instructions written in that language. We also sometimes call these "scripting languages"
Some of the most famous interpreted languages are: PHP, JavaScript Some of the most famous interpreted languages are: PHP, JavaScript
{{< notice tip >}}
Remember how I mentioned that when a program (written in a compiled language) is compiled, it will only be possible to
run it on the architecture it was compiled for? Well, that's not necessarily entirely correct.
It is actually possible to run a program compiled for a different CPU architecture, by using **emulation**.
Emulation is a process of literally simulating a different CPU. It is a program which takes in the machine instructions
as it's input, and processes those instructions as if it was a CPU, setting the appropriate internal registers (often
represented as variables), keeping track of memory, and a whole bunch of other things. This program is then compiled
for your CPU architecture, so that it can run on your machine. This is what's called an **Emulator**.
With an emulator, we can simply feed it the compiled program for the CPU it emulates, and it will do exactly what a
real CPU would do, running this program.
That said, emulators are usually very slow, as they're programs which run on a real CPU, having to keep track of the
registers, memory, and a bunch of other things inside of itself, rather than inside the actual physical CPU we're
running on, as our CPU might not even have such a register/opcode, so it needs to execute a bunch of native
instructions to execute just the single foreign instruction.
Notice that an emulator is actually an interpreter, making compiled machine code for another CPU it's interpreted
language!
{{< /notice >}}
## Compilation for operating systems
So far, we only talked about compiled languages that output machine code, specific to some CPU architecture. However in
vast majority of cases, that's not actually what the compiler will output anymore (at least not entirely).
Nowadays, we usually don't make programs that run on their own. Instead, we're working under a specific operating
system, which then potentially handles a whole bunch of programs running inside of it. All of these operating systems
contain something called a "kernel", which is the core part of that system, which contains a bunch of so called
"syscalls".
Syscalls are basically small functons that the kernel exposes for us. These are things like opening and reading a file,
creating a network socket, etc. These syscalls are incredibly useful, as the kernel contains the logic (drivers) for a
whole bunch of different hardware devices (such as network cards, audio speakers/microphones, screens, keyboards, ...),
and the syscalls it exposes are an abstraction, that gives us the same kind of interface for each device of a certain
type (i.e. every speaker will be able to output a tone at some frequency), which we can utilize, without having to care
about exactly how that specific device works (different speakers might need different voltages sent to them to produce
the requested frequency).
For this reason, programs running under an OS will take advantage of this, and instead of outputting pure machine code,
they output an executable file, in a OS-specific format (such as an .exe on Windows, or an ELF file on Linux). The
instructions in this file will then also contain a special "SYSCALL" instruction, which the kernel will respond to and
run the appropriate function.
This however makes the outputted executable not only CPU architecture dependant, but also OS dependant, making it even
less portable across various platforms.
## Core differences ## Core differences
As mentioned, with a compiled language, the source code is private and we can simply only distribute the compiled Compiled languages will have almost always have speed benefit to them, because they don't need additional program to
version, whereas with an interpreted language this is completely impossible since we need the source code because interpret the instruction within that language when being ran, instead, this program is ran by the programmer only
that's what's actually being ran by the interpreter, instruction after instruction. The best we can do if we really once, producing an executable that can run on it's own.
wanted to hide the source-code with an interpreted language is to obfuscate it.
Compiled languages will also have a speed benefit to them, because they don't need additional program to interpret the Compilers often also perform certain optimizations, for example, if they find code that would always result in a same
instruction within that language, but rather needs to go through an additional step of identifying the instructions and thing, like say: `a = 10 * 120`, we could do this calculation in the compiler, and only store the result `1200`,
running the code for them. Compilers often also perform certain optimizations, for example with code that would always into the final program, making the run-time faster.
result in a same thing, something like `a = 10 * 120`, we could compile it and only store the result `1200`, running
the actual equation at compile time, making the running time faster.
So far it would look like compiled languages are a lot better than interpreted, but they do have many disadvantages to Yet another advantage of compiled languages is that the original source code can be kept private, since we can simply
them as well. One of which I've already mention, that is not being cross-platform. Once a program is compiled, it will only distribute the pre-compiled binaries. At most, people can look at the resulting machine code, which is however
only ever run on the platform it was compiled for. If the compilation was happening directly to a machine code, this very hard to understand. In comparison, an interpreted language needs the interpreter to read the code to run it, which
would mean it would be architecture-specific. means when distributing, we would have to provide the full source-code. The best we can do if we really wanted to hide
what the code is doing is to obfuscate it.
But we usually don't do this and rather compile for something kernel-specific, because we are running under some So far it would look like compiled languages are a lot better than interpreted, but they do have a significant
specific operating system that uses some kernel. The kernel is basically acting as a big interpreter for every single disadvantage to them as well. One of which that I've already mentioned is not being cross-platform. Once a program is
program. We do this because this means that we can implement some security measures that for example disallow untrusted compiled, it will only be runnable on the platform it was compiled for. That is, not only on the same CPU architecture,
programs to read or write to a memory location that doesn't belong to that program. but also only on the same operating system, meaning we'll need to be compiling for every os on every architecture.
This alone is a pretty big disadvantage, because we will need to compile our program for every operating system it is The process of compiling for all of these various platforms might not be easy, as cross-compiling (the process of
expected to be ran on. In the case of an interpreted language, all we need to do is have the actual interpreter to be compiling for a program for different CPU architecture than that which you compile on) is often hard to set up, or even
executable on all platforms, but the individual programs made with that language can then be ran on any platform, as impossible because the tooling simply isn't available on your platform. So you may need to actually get a machine
long as it has the interpreter. running on the platform you want to compile for, and do so directly on it, which is very tedious.
From this example, we can also see another reason why we may want to use an interpreter, that is the kernel itself, However with an interpreted language, the same code will run on any platform, as long as the interpreter itself is
with it we can implement these security measures and somewhat restrict parts of what can be done, this is very crucial available (compiled) for that platform. This means rather than having to distribute dozens of versions for every single
to having a secure operating system. platform, it would be enough to ship out the source code itself, and it will run (almost) anywhere.
Another advantage of interpreted languages is the simple fact that they don't need to be compiled, it's one less step Interpreted languages are also usually a bit easier to write, as they can afford to be a bit more dynamic. For example,
in the process of distributing the application and it also means that it's much easier to write automated tests for, in C, we need to know exactly how big a number can get, and choose the appropriate number type (int, long, long long,
and for debugging in general. short, not to mention all of these can be signed/unsigned), so that the compiler can work with this information and do
some optimizations based on it, however in an interpreted language, a number can often grow dynamically, sort of like a
vector, taking up more memory as needed. (It would be possible to achieve the same in a compiled language, but it would
be at an expense of a bunch of optimizations that the compiler wouldn't be able to do anymore, so it's usually not done).
## Hybrid languages ## Hybrid languages
@ -138,9 +193,41 @@ done up-front, that's a bit inflexible, or the interpreted model, where all work
bit slower, we kind of combine things and do both. bit slower, we kind of combine things and do both.
Up-front, we compile it partially, to what's known as byte-code, or intermediate language. This takes things as close Up-front, we compile it partially, to what's known as byte-code, or intermediate language. This takes things as close
to being compiled as we can, while still being portable across many platforms, and we then distribute this byte-code to being compiled as we can, while still being portable across many platforms. We can then make some optimizations to
rather than the full source-code and each person who runs it does the last step of taking it to machine code by running this byte-code, just like a regular compiler might, though usually this won't be nowhere near as powerful, because the
this byte-code with an interpreter. This is also known as Just In Time (JIT) compilation. byte-code is still pretty abstract.
Once we get our byte-code (optimized, or not), there are 2 options of what happens next:
### Byte code interpreter
The first option is that the language has an interpreter program, which takes in this byte-code, and runs it from that.
If this is the case, a program in such language could be distributed as this byte-code, instead of as pure source-code,
as a way to keep the source-code private. While this byte-code will be easier to understand than pure machine code if
someone were to attempt to reverse-engineer it, it is still a better option than having to ship the real source-code.
This therefore combines the advantages of an interpreted language, of being able to run anywhere, with those of a
compiled language, of not having to ship the plaintext source-code, and of doing some optimizations, to minimize the
run-time of the code.
Examples of languages like these are: Python, ...
Most languages tend to only be one or the other, but there are a fair few that follow this hybrid implementation, most Most languages tend to only be one or the other, but there are a fair few that follow this hybrid implementation, most
notably, these are: Java, C#, Python, VB.NET notably, these are: Java, C#, Python, VB.NET
### Byte code in compiled languages
As a second option,
The approach of generating byte-code isn't unique to hybrid languages though. Even pure compiled languages actually
often generate byte-code first, and then let the compiler compile this byte-code, rather than going directly from
source code to machine code.
This is actually very beneficial, because it means multiple vastly different languages, like C/C++ and Rust can end up
being first compiled into the same kind of byte-code, which is then fed into yet another compiler, a great example of
this is LLVM, which then finally compiles it into the machine code. But many languages have their own JIT.
The reason languages often like to use this approach is that they can rely on a well established compiler, like LLVM,
to do a bunch of complex optimizations of their code, or compilation for different architectures, without having to
write out their own logic to do so. Instead, all they have to write is a much smaller compiler that turns their
language into LLVM compatible byte-code, and let it deal with optimizing the rest.

View file

@ -10,7 +10,7 @@
<div class="item-list-group"> <div class="item-list-group">
<ul class="item-list-items"> <ul class="item-list-items">
{{ range .Pages.ByTitle }} {{ range .Pages.ByTitle }}
<li class="item-list-item" data-id="{{ with .File}}{{ .File.UniqueID }}{{ end }}"> <li class="item-list-item" data-id="{{ with .File}}{{ .UniqueID }}{{ end }}">
{{ partial "list_item.html" (dict "context" . "dateformat" "Jan 02, 2006") }} {{ partial "list_item.html" (dict "context" . "dateformat" "Jan 02, 2006") }}
</li> </li>
{{ end }} {{ end }}

View file

@ -25,7 +25,7 @@
<!-- List all posts in this year group --> <!-- List all posts in this year group -->
<ul class="item-list-items"> <ul class="item-list-items">
{{ range .Pages }} {{ range .Pages }}
<li class="item-list-item" data-id="{{ with .File}}{{ .File.UniqueID }}{{ end }}"> <li class="item-list-item" data-id="{{ with .File}}{{ .UniqueID }}{{ end }}">
{{ partial "list_item.html" (dict "context" . "dateformat" "Jan 02") }} {{ partial "list_item.html" (dict "context" . "dateformat" "Jan 02") }}
</li> </li>
{{ end }} {{ end }}