diff --git a/content/posts/concurrency-and-parallelism.md b/content/posts/concurrency-and-parallelism.md index aa17e0c..916a4ee 100644 --- a/content/posts/concurrency-and-parallelism.md +++ b/content/posts/concurrency-and-parallelism.md @@ -89,7 +89,7 @@ Consider this code: ``` In the example here, we can see that python keeps a reference count for the empty list object, and in this case, it was -3. The list object was referenced by a, b and the argument passed to `sys.getrefcount`. If we didn't have locks, +\3. The list object was referenced by a, b and the argument passed to `sys.getrefcount`. If we didn't have locks, threads could attempt to increase the reference count at once, this is a problem because what would actually happen would go something like this: @@ -102,7 +102,7 @@ would go something like this: [Treat sections of 2 lines as things happening concurrently] -You can see that because threads 1 and 2 both read the reference amount from memory at the same time, they read the +You can see that because threads 1 and 2 both read the reference amount from memory at the same time, they read the same number, then they've increased it and stored it back without ever knowing that some other thread is also in the process of increasing the reference count but it read the same amount from memory as this process, so even though the first thread stored the updated amount, the 2nd thread also stored the updated amount, except they were the same @@ -167,6 +167,7 @@ become incorrect in a way that's hard to see during code reviews. ## Debugging multi-threaded code As an example, this is a multi-threaded code that will pass all tests and yet it is full of bugs: + ```python import threading @@ -183,6 +184,7 @@ for _ in range(5): threading.Thread(target=foo).start() print("Finished") ``` + When you run this code, you will most likely get a result that you would expect, but it is possible that you could also get a complete mess, it's just not very likely because the code runs very quickly. This means you can write code multi-threaded code that will pass all tests and still fail in production, which is very dangerous. @@ -192,6 +194,7 @@ behind every instruction to ensure that it is safe if a switch happens during th it is advised to run the code multiple times because there is a chance of getting the correct result even with this method since it always is one of the possibilities, this is why multi-threaded code can introduce a lot of problems. This would be the code with this "fuzzing" method applied: + ```python import threading import time @@ -219,6 +222,7 @@ for _ in range(5): threading.Thread(target=foo).start() print("Finished") ``` + You may also notice that I didn't just add `fuzz()` call to every line, I've also split the line that incremented counter into 2 lines, one that reads the counter and another one that actually increments it, this is because internally, that's what would be happening it would just be hidden away, so to add a delay between these instructions @@ -226,6 +230,7 @@ I had to actually split the code like this. This makes it almost impossible to t problem. It is possible to fix this code with the use of locks, which would look like this: + ```python import threading @@ -257,13 +262,15 @@ for t in worker_threads: with printer_lock: print("Finished") ``` + As we can see, this code is a lot more complex than the previous one, it's not terrible, but you can probably imagine that with a bigger codebase, this wouldn't be fun to manage. -Not to mention that there is a core issue with this code. Even though the code will work and doesn't actually have any +Not to mention that there is a core issue with this code. Even though the code will work and doesn't actually have any bugs, it is still wrong. Why? When we use enough locks in our multi-threaded code, we may end up making it full sequential, which is what happened here. Our code is running synchronously, with huge amount of overhead from the locks that didn't need to be there and the actual code that would've been sufficient looks like this: + ```python counter = 0 print("Starting") @@ -273,6 +280,7 @@ for _ in range(5) print("----------------------") print("Finished") ``` + While in this particular case, it may be pretty obvious that there was no need to use threading at all, there are a lot of cases in which it isn't as clear and I have seen some projects with code that could've been sequential but they were already using threading for something else and so they made use of locks and added some other functionality, which made diff --git a/content/posts/escaping-isolated-network.md b/content/posts/escaping-isolated-network.md index 3be3b9a..dda7617 100644 --- a/content/posts/escaping-isolated-network.md +++ b/content/posts/escaping-isolated-network.md @@ -58,18 +58,22 @@ youtube-dl, download the video and then stream it from our machine instead of fr download the file from our server that has now downloaded this video, however that's way too crude. There is a much nicer method that we can use, and it is still utilizing pure SSH: + ```sh ssh -f -N -D 1080 user@server ``` + This command will start SSH in background (`-f`), it won't run any actual commands (`-N`) and it will be bound to the port 1080 on our machine (`-D`). This means that we can utilize this port as a SOCK and make our server act as SOCKS5 proxy. This kind of proxy will even be supported by most web browsers, allowing you to simply specify the address (in our case `127.0.0.1:1080`) and have all traffic go through this external server. To test that this connection really does work, we could use the `curl` command like this: + ```sh curl --max-time 3 -x socks5h://127.0.0.1:1080 https://itsdrike.com ``` + If we see the HTML code as the output, it means that we've obtained the content of the specified website through our socks5 proxy, that we've established through simple SSH. @@ -93,9 +97,11 @@ around SSH and it will simply utilize SSH in the background, which is also why w server side for this to work properly, as long as we simply have the SSH server running, `sshuttle` will work fine. We can use sshuttle with a command like this: + ```sh sudo sshuttle -r user@machine 172.67.161.205/24 -vv ``` + Which will forward all traffic destined for the particular address block (the IP/number is called the CIDR notation, it essentially specifies which IPs should be affected depending on the number after /, you can read more about it on [wikipedia](https://wikiless.org/wiki/Classless_Inter-Domain_Routing?lang=en)). In this case, I've specified the IP of @@ -112,12 +118,14 @@ you need to think about this ahead of time. You could also simply redirect the port 22 to something else using iptables instead of having to mess with the SSH config. You would do that with this command: + ```sh sudo iptables -t nat -I PREROUTING -p tcp --dport 1234 -j REDIRECT --to-ports 22 ``` This command will make port `1234` act as the SSH port, and you could then access the server by specifying this port instead of the default port in the ssh command: + ``` ssh -f -N -D 1080 user@server -p 1234 ``` @@ -213,9 +221,11 @@ Turns out that even with a security measure as strict as only allowing access to somewhat make our way to our server, by essentially telling it to map all exiting traffic from port 443 to port 22. To do this, we would use a command like this: + ```sh ssh -o "ProxyCommand nc -X connect -x proxy_server:3128 our_server_IP 443" user@our_server_IP ``` + Here we're essentially sending a proxy command to the web proxy server (listening on port 3128) to through the port 443 to our_server_IP and make requests to the SSH's default port (22) on our_server_IP. Making the actual proxy server access our server on port 22. @@ -235,8 +245,10 @@ really be possible. To explain how easy it is to discover something like this, basically all that's needed is to run a single command on that web proxy: + ```sh iptables -t nat -L ``` + And look for the output policy destinations. Even though many network admins won't do this, you shouldn't ever risk doing something silly like this, because if you will get discovered, you could get into some serious trouble diff --git a/content/posts/git-credentials.md b/content/posts/git-credentials.md index 465a9dc..7efd1de 100644 --- a/content/posts/git-credentials.md +++ b/content/posts/git-credentials.md @@ -3,30 +3,30 @@ title: Managing (multiple) git credentials date: 2022-07-27 tags: [programming, git] sources: - - - - - - - - - - - - - - - - - - - - - - - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - changelog: - 2023-01-30: - - Add note about disabling commit signing - - Add alternative command for copying on wayland - - Fix typos and text wrapping + 2023-01-30: + - Add note about disabling commit signing + - Add alternative command for copying on wayland + - Fix typos and text wrapping --- Many people often find initially setting up their git user a bit unclear, especially when it comes to managing multiple git users on a single machine. But even managing credentials for just a single user can be quite complicated without looking into it a bit deeper. Git provides a lot of different options for credential storage, and picking one can be -hard without knowing the pros and cons of that option. +hard without knowing the pros and cons of that option. Even if you already have your git set up, I'd still recommend at least looking at the possible options git has for credential storage, find the method you're using and make sure it's actually secure enough for your purposes. But @@ -78,10 +78,9 @@ configured account, you can disable it with: ```bash git config --local commit.gpgsign false ``` + {{< /notice >}} - - ## Git credentials User configuration is one thing, but there's another important part of account configuration to consider, that is @@ -98,7 +97,7 @@ first take a look at the most straight-forward method, which is to store them in # While clonning: git clone https://:@github.com/path/to/repo.git # After initialized repo without any added remote: -git remote add origin +git remote add origin # On an already clonned repository without the credentials: git remote set-url origin https://:@github.com/path/to/repo.git ``` @@ -170,7 +169,7 @@ worried about leaking your **username** (not password) for the git hosting provi If you're using the global configuration, this generally shouldn't be a big concern, since the username won't actually be in the project file unlike with the remote-urls. However if you share a machine with multiple people, you may want to consider securing your global configuration file (`~/.config/git/config`) using your filesystem's permission -controls to prevent others from reading it. +controls to prevent others from reading it. If you're defining contexts in local project's config though, you should be aware that the username will be present in `.git/config`, and sharing this project with others may leak it. @@ -239,7 +238,6 @@ The cache credential helper will never write your credential data to disk, altho Unix sockets. These sockets are protected using file permissions that are limited to the user who stored them though, so even in multi-user machine, generally speaking, they are secure. - #### Custom credential helpers Apart from these default options, you can also use [custom @@ -344,7 +342,7 @@ recognized. To run this test, you can simply issue this command (should work on ssh -T git@github.com -i ~/.ssh/id_ed25519 ``` -Running this command should produce a welcome message informing you that the connection works. +Running this command should produce a welcome message informing you that the connection works. If you are unsuccessful, you can run the command in verbose mode in order to get more details on why your connection was not established. @@ -426,10 +424,10 @@ to remember the username or the password, instead you just need to know the host Generally, using SSH keys is the safest approach, but it can also be a bit annoying since it requires you to specify the SSH host for each repository in it's remote url. For that reason, the approach that I would recommend is using -git's credential helper system to store your credentials instead. +git's credential helper system to store your credentials instead. However if you will go with this method, make sure that you're using a personal access token instead of the actual -account's password, to limit the permissions an attacker would gain in case your credentials were leaked. +account's password, to limit the permissions an attacker would gain in case your credentials were leaked. If your git hosting platform doesn't provide access tokens, this method becomes a lot more dangerous to use, since if an attacker would somehow obtain the credentials file from your system, they would be able to gain full access to your @@ -500,11 +498,11 @@ git config credentials.helper 'store --file=/home/user/.config/git-credentials-w ``` With this approach, you can have your credentials kept in multiple separate credential files, and just mention the path -to the file you need for each project. +to the file you need for each project. Security-wise, this method is better because your username will be kept outside of the project in the referenced git credential file, which should be secured by the file system's permissions to prevent reads from other users. However -practicality-wise, it may be a bit more inconvenient to type and even to remember the path to each credential file. +practicality-wise, it may be a bit more inconvenient to type and even to remember the path to each credential file. ### SSH keys instead diff --git a/content/posts/interpreted-vs-compiled.md b/content/posts/interpreted-vs-compiled.md index 4543ba2..ba31594 100644 --- a/content/posts/interpreted-vs-compiled.md +++ b/content/posts/interpreted-vs-compiled.md @@ -144,4 +144,3 @@ this byte-code with an interpreter. This is also known as Just In Time (JIT) com Most languages tend to only be one or the other, but there are a fair few that follow this hybrid implementation, most notably, these are: Java, C#, Python, VB.NET - diff --git a/content/posts/json-vs-databases.md b/content/posts/json-vs-databases.md index 40f70a2..d0a9374 100644 --- a/content/posts/json-vs-databases.md +++ b/content/posts/json-vs-databases.md @@ -36,10 +36,10 @@ systems is, as the name would imply, manage the database. It controls how the da if there should be some internal compression of this database or things like that. Even though there are a lot of choices for DBMS, no matter which one we end up using, on the surface, they will be -doing exactly the same thing. Storing tables of data. Each item in the database usually has some *primary key*, which +doing exactly the same thing. Storing tables of data. Each item in the database usually has some _primary key_, which is a unique identifier for given column of data. We can also have composite primary keys, where there would be multiple slots which are unique when combined, but don't necessarily need to be unique on their own. We can also use what's -called a *foreign key*, which is basically the primary key of something in another database table, separate from the +called a _foreign key_, which is basically the primary key of something in another database table, separate from the current one, to avoid data repetition. This would be an example of the data tables that a database could hold: Table of students: @@ -68,8 +68,8 @@ Student Grades: | ... | ... | ... | ... | {{< /table >}} -Here we can see that the *Student Grades* table doesn't have a standalone unique primary key, like the Students table -has, but rather it has a composite primary key, in this case, it's made of 3 columns: *Student*, *Subject* and *Year*. +Here we can see that the _Student Grades_ table doesn't have a standalone unique primary key, like the Students table +has, but rather it has a composite primary key, in this case, it's made of 3 columns: _Student_, _Subject_ and _Year_. We can also see that rather than defining the whole student all over again, since we already have the students table, we can instead simply use the Student ID from which we can look up the given student from this table @@ -148,4 +148,3 @@ Another use-case for databases is when you need to host the data of the database database, we can simply expose some port and let the DBMS handle interactions with it when our program can simply be making requests to this remote server. This is usually how we handle using databases with servers, but many client-side programs are creating their own local databases and using those, simply because using files is ineffective. - diff --git a/content/posts/removing-list-duplicates.md b/content/posts/removing-list-duplicates.md index 8dc348e..ecae567 100644 --- a/content/posts/removing-list-duplicates.md +++ b/content/posts/removing-list-duplicates.md @@ -134,8 +134,8 @@ return list(result.values()) To preserve the original elements, we used a dict that held the unique hashable memory ids of our objects as keys and the actual unhashable objects as values. Once we were done, we just returned all of the values in it as a list. -The result of this would be: `[x, y, 1, 2, "hi", Foo(x=5)]`. *(Note that `x`, `y` and `Foo(x=5)` would actually be -printed in the same way, since they're the same class, sharing the same `__repr__`)*. From this output we can clearly +The result of this would be: `[x, y, 1, 2, "hi", Foo(x=5)]`. _(Note that `x`, `y` and `Foo(x=5)` would actually be +printed in the same way, since they're the same class, sharing the same `__repr__`)_. From this output we can clearly see that even though `x`, `y`, and `Foo(x=5)` are exactly the same thing, sharing the same attributes, they're different objects and therefore they have different memory ids, which means our algorithm didn't remove them, however there is now only one `x`, because the second one was indeed exactly the same object, so that did get removed. @@ -217,4 +217,3 @@ one if we know which classes will be used there. Even though we do have ways to deal with unhashables, if you're in control of the classes, and they aren't supposed to be mutable, always make sure to add a `__hash__` method to them, so that duplicates can be easily removed in `O(n)` without any complicated inconveniences. - diff --git a/content/posts/typing-variance-of-generics.md b/content/posts/typing-variance-of-generics.md index a179836..086e55d 100644 --- a/content/posts/typing-variance-of-generics.md +++ b/content/posts/typing-variance-of-generics.md @@ -165,7 +165,6 @@ Here's a list of some definable generic types that are currently present in pyth | Mapping[str, int] | Mapping from `str` keys to `int` values (immutable) | {{< /table >}} - In python, we can even make up our own generics with the help of `typing.Generic`: ```python @@ -247,7 +246,7 @@ x: Tuple[Vehicle, ...] = cars # some of the functionalities of cars, so a type checker would complain here x: Tuple[Car, ...] = vehicles -# In here, both of these assignments are valid because both cars and vehicles will +# In here, both of these assignments are valid because both cars and vehicles will # implement all of the logic that a basic `object` class needs. This means this # assignment is also valid for a generic that's covariant. x: Tuple[object, ...] = cars @@ -288,7 +287,7 @@ x: Callable[[], Car] = get_wolkswagen_car # However this wouldn't really make sense the other way around. # We can't assign a function which returns any kind of Car to a variable with is expected to -# hold a function that's supposed to return a specific type of a car. This is because not +# hold a function that's supposed to return a specific type of a car. This is because not # every car is a WolkswagenCar, we may get an AudiCar from this function, and that may not # support everything WolkswagenCar does. x: Callable[[], WolkswagenCar] = get_car @@ -371,10 +370,10 @@ def remove_while_used(func: Callable[[Library, Book], None]) -> Callable[[Librar return wrapper -# As we can see here, we can use the `remove_while_used` decorator with the +# As we can see here, we can use the `remove_while_used` decorator with the # `read_fantasy_book` function below, since this decorator expects a function # of type: Callable[[Library, Book], None] to which we're assigning -# our function `read_fantasy_book`, which has a type of +# our function `read_fantasy_book`, which has a type of # Callable[[Library, FantasyBook], None]. # # Obviously, there's no problem with Library, it's the same type, but as for @@ -384,7 +383,7 @@ def remove_while_used(func: Callable[[Library, Book], None]) -> Callable[[Librar # the necessary criteria for a general Book, it just includes some more special # things, but the decorator function won't use those anyway. # -# Since this assignment is be possible, it means that Callable[[Library, Book], None] +# Since this assignment is be possible, it means that Callable[[Library, Book], None] # is a subtype of Callable[[Library, FantasyBook], None], not the other way around. # Even though Book isn't a subtype of FantasyBook, but rather it's supertype. @remove_while_used @@ -468,9 +467,9 @@ people: List[Person] = children # Since we know that `people` is a list of `Person` type elements, we can obviously # pass it over to `append_adult` function, which takes a list of `Person` type elements. -# After we called this fucntion, our list got altered. it now includes an adult, which +# After we called this fucntion, our list got altered. it now includes an adult, which # is fine since this is a list of people, and `Adult` type is a subtype of `Person`. -# But what also happened is that the list in `children` variable got altered! +# But what also happened is that the list in `children` variable got altered! append_adult(people) # This will work fine, all people can eat, that includes adults and children @@ -590,7 +589,7 @@ c: Matrix[Z] = x # INVALID! Matirx isn't contravariant In this case, our Matrix generic type is covariant in the element type, meaning that if we have a `Matrix[Y]` type and `Matrix[X]` type, we could assign the `University[Y]` to the `University[X]` type, hence making it it's -subtype. +subtype. We can make this Matrix covariant because it is immutable (enforced by slots and custom setattr logic). This allows this matrix class (just like any other sequence class), to be covariant. Since it can't be altered, this covariance is @@ -646,7 +645,7 @@ time, I wasn't able to think of anything better. covariant, since otherwise, you'd need to recast your variable manually when defining another type, or copy your whole generic, which would be very wasteful, just to satisfy type-checkers. Less commonly, you can also find it helpful to mark your generics as contravariant, though this will usually not come up, maybe if you're using - protocols, but with full standalone generics, it's quite rarely used. Nevertheless, it's important to + protocols, but with full standalone generics, it's quite rarely used. Nevertheless, it's important to - Once you've made a typevar covariant or contravariant, you won't be able to use it anywhere else outside of some generic, since it doesn't make sense to use such a typevar as a standalone thing, just use the `bound` feature of a type variable instead, that will define it's upper bound types and any subtypes of those will be usable.