mirror of
				https://github.com/ItsDrike/itsdrike.com.git
				synced 2025-11-04 04:06:36 +00:00 
			
		
		
		
	Add a post about concurrency and parallelism
This commit is contained in:
		
							parent
							
								
									ea39ed18c7
								
							
						
					
					
						commit
						de560ef99b
					
				
					 1 changed files with 371 additions and 0 deletions
				
			
		
							
								
								
									
										371
									
								
								content/posts/concurrency-and-parallelism.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										371
									
								
								content/posts/concurrency-and-parallelism.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,371 @@
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					title: Concurrency and Parallelism
 | 
				
			||||||
 | 
					date: 2021-11-17
 | 
				
			||||||
 | 
					tags: [programming]
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Concurrency is an exciting topic that's becoming more and more important, yet I see so many people that aren't very
 | 
				
			||||||
 | 
					familiar with topic and it's possibilities. I'll try to explain the differences between threading, multiprocessing and
 | 
				
			||||||
 | 
					asynchronous run. I'll also show some examples when concurrency should be avoided, and when it makes sense.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					I'll be talking about concurrency with python language in mind, but even if you don't use python, I still think that
 | 
				
			||||||
 | 
					you can learn a lot from this article if you aren't that familiar with key concurrency concepts. My hope is that after
 | 
				
			||||||
 | 
					you read this article, you will confidently know the differences between the various concurrency methods and their
 | 
				
			||||||
 | 
					individual advantages or disadvantages when compared to each other.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Why concurrency?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In programming, we often have the need to do things very quickly so that our program isn't slow. But we also often need
 | 
				
			||||||
 | 
					to perform complex operations which take some time to actually compute. To cope with this, we can sometimes perform
 | 
				
			||||||
 | 
					certain tasks at the same time.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As an example, we can think about concurrency as the amount of lanes on a highway. If we have a highway with just one
 | 
				
			||||||
 | 
					single lane, all cars on it would have to use that lane and they would travel only as quickly as the slowest car in
 | 
				
			||||||
 | 
					front of them. But once we bring in another lane, we can already see huge improvements because the cars can go at their
 | 
				
			||||||
 | 
					own speeds on separate lanes and we can physically fit in more cars.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Similarly to this example, when we use concurrency, we allocate multiple physical CPUs/cores to a process, essentially
 | 
				
			||||||
 | 
					giving it more clock cycles, however not every task is suited for concurrent run, consider this example:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					x = my_function()
 | 
				
			||||||
 | 
					y = my_other_function(x)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We can clearly see that `my_other_function` is completely dependent on the result of `my_function`, this means that it
 | 
				
			||||||
 | 
					wouldn't make any sense to run these concurrently on 2 cores, because `my_other_function` would just wait for
 | 
				
			||||||
 | 
					`my_function` and only after it's finished will it start running. We just used 2 cores and did something slower than
 | 
				
			||||||
 | 
					with one core. It was slower because it took some time to send the result of `my_function` to `my_other_function`
 | 
				
			||||||
 | 
					running in a separate process.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This shows us that not all tasks are suited to run concurrently, but there are some that really could benefit from this
 | 
				
			||||||
 | 
					form of run. For example if we wanted to read the content of 200 different files, reading them one-by-one would take a
 | 
				
			||||||
 | 
					lot of time, but if we were able to read all 200 concurrently, it would only take us the duration of reading 1 file,
 | 
				
			||||||
 | 
					yet we would get the content of all 200 files. (Of course we're assuming that our disk would support reading 200 things
 | 
				
			||||||
 | 
					at once).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Even though this sounds good, it's never as simple as it first sounds. Even though it is true, on most machines, we
 | 
				
			||||||
 | 
					won't actually be able to run 200 things at once because we don't have 200 CPUs/cores. Concurrency like this will
 | 
				
			||||||
 | 
					always be limited by the hardware of the computer your software is running on. If you have a computer with 8 logical
 | 
				
			||||||
 | 
					CPUs/cores we can only run 8 things at once. Even though it obviously won't be as good as running 200 tasks at once, it
 | 
				
			||||||
 | 
					will still be way better than running single task at once. In our example, we would be able to get the results of all
 | 
				
			||||||
 | 
					200 tasks in the amount of time it would take to run 25 tasks sequentially, this is still a huge improvement.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Threads vs Processes
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Understanding the concept of concurrency is one thing, but how do we actually run our code on multiple cores/CPUs?
 | 
				
			||||||
 | 
					Luckily for us, this is all handled by the operating system that we're running. The kernel of this OS has to manage
 | 
				
			||||||
 | 
					thousands of processes with their threads that all have to run and all of those constantly fight to get as much CPU
 | 
				
			||||||
 | 
					clock cycles as they can. It is up to the OS to determine which processes are more important then others and to
 | 
				
			||||||
 | 
					interchange these processes so that each process gets enough CPU time, having multiple cores helps a lot because rather
 | 
				
			||||||
 | 
					than constantly swapping processes around on a single core, we can run n processes at once and the OS has less overall
 | 
				
			||||||
 | 
					swapping to do. But what are these processes and the threads attached to hem?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					But what are Threads? The concept of a process is probably not a hard one to understand, it's just a separate program
 | 
				
			||||||
 | 
					that we started, but a thread is a bit more interesting than that. Threads are essentially a way for a single process
 | 
				
			||||||
 | 
					to do 2 things concurrently, yet keep existing in the shared-state of a single process. This means we don't have any
 | 
				
			||||||
 | 
					communication overhead and it's very easy to pass information along. However this property can often be disadvantage,
 | 
				
			||||||
 | 
					since threads work on a single shared state, we often need to use locks to properly communicate without causing issues.
 | 
				
			||||||
 | 
					(I'll explain the importance of locks with some examples later, but essentially, we need locks to prevent data loss
 | 
				
			||||||
 | 
					when 2 threads make a change to the same place in memory at once.)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As you were probably able to figure out, the advantage of processes is that they don't have these shared states,
 | 
				
			||||||
 | 
					processes are fully independent, however this is also a disadvantage because it is hard to communicate between these
 | 
				
			||||||
 | 
					processes. Since we don't have this shared state, if processes want to talk to each other, they need to find take the
 | 
				
			||||||
 | 
					objects from memory, serialize them and move them across a raw socket to another process, where it can get
 | 
				
			||||||
 | 
					deserialized. (This will most likely be done with `pickle` library in python.) This means processes have huge
 | 
				
			||||||
 | 
					communication cost compared to threads.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Why do we need locks?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Consider this code:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					>>> import sys
 | 
				
			||||||
 | 
					>>> a = []
 | 
				
			||||||
 | 
					>>> b = a
 | 
				
			||||||
 | 
					>>> sys.getrefcount(a)
 | 
				
			||||||
 | 
					3
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In the example here, we can see that python keeps a reference count for the empty list object, and in this case, it was
 | 
				
			||||||
 | 
					3. The list object was referenced by a, b and the argument passed to `sys.getrefcount`. If we didn't have locks,
 | 
				
			||||||
 | 
					threads could attempt to increase the reference count at once, this is a problem because what would actually happen
 | 
				
			||||||
 | 
					would go something like this:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> Thread 1: Read the current amount of references from memory (for example 5)
 | 
				
			||||||
 | 
					> Thread 2: Read the current amount of references from memory (same as above - 5)
 | 
				
			||||||
 | 
					> Thread 1: Increase this amount by 1 (we're now at 6)
 | 
				
			||||||
 | 
					> Thread 2: Increase this amount by 1 (we're also at 6 in the 2nd thread)
 | 
				
			||||||
 | 
					> Thread 1: Store the increased amount back to memory (we store this increased amount of 6 back to memory)
 | 
				
			||||||
 | 
					> Thread 2: Store the increased amount back to memory (we store the increased amount of 6 to memory?)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					[Treat sections of 2 lines as things happening concurrently]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You can see that because threads 1 and 2 both read the reference amount from memory  at the same time, they read the
 | 
				
			||||||
 | 
					same number, then they've increased it and stored it back without ever knowing that some other thread is also in the
 | 
				
			||||||
 | 
					process of increasing the reference count but it read the same amount from memory as this process, so even though the
 | 
				
			||||||
 | 
					first thread stored the updated amount, the 2nd thread also stored the updated amount, except they were the same
 | 
				
			||||||
 | 
					amounts.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Suddenly we have no solid way of knowing how many references there actually are to our list which means it may get
 | 
				
			||||||
 | 
					removed by automated garbage collection because we've hit 0 references when we actually still have an active reference.
 | 
				
			||||||
 | 
					There is a way to circumvent this though, and that is with the use of locks
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Dummy internal code:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					lock.acquire()
 | 
				
			||||||
 | 
					references = sys.getrefcount()
 | 
				
			||||||
 | 
					references += 1
 | 
				
			||||||
 | 
					update_references(references)
 | 
				
			||||||
 | 
					lock.release()
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Here, before we even started to read the amount of references, we've acquired a lock, preventing other threads from
 | 
				
			||||||
 | 
					continuing and causing them to wait until a lock is released so that another thread can acquire it. With this code,
 | 
				
			||||||
 | 
					it wold go something like this:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> Thread 1: Try to acquire a shared lock between threads (lock is free, Thread 1 now has the lock)
 | 
				
			||||||
 | 
					> Thread 2: Try to acquire a shared lock between threads (lock is already acquired by Thread 1, we're waiting)
 | 
				
			||||||
 | 
					> Thread 1: Read the current amount of references from memory (for example 5)
 | 
				
			||||||
 | 
					> Thread 2: Try to acquire the lock (still waiting)
 | 
				
			||||||
 | 
					> Thread 1: Increase this amount by 1 (we're now at 6)
 | 
				
			||||||
 | 
					> Thread 2: Try to acquire the lock (still waiting)
 | 
				
			||||||
 | 
					> Thread 1: Store the increased amount back to memory (we now have 6 in memory)
 | 
				
			||||||
 | 
					> Thread 2: Try to acquire the lock (still waiting)
 | 
				
			||||||
 | 
					> Thread 1: Release the lock
 | 
				
			||||||
 | 
					> Thread 2: Try to acquire the lock (success, Thread 2 now has the lock)
 | 
				
			||||||
 | 
					> Thread 1: Finished (died)
 | 
				
			||||||
 | 
					> Thread 2: Read the current amount of references from memory (read value 6 from memory)
 | 
				
			||||||
 | 
					> Thread 2: Increase this amount by 1 (we're now at 7)
 | 
				
			||||||
 | 
					> Thread 2: Store the increased amount back to memory (we now have 7 in memory)
 | 
				
			||||||
 | 
					> Thread 2: Release the lock
 | 
				
			||||||
 | 
					> Thread 2: Finished (died)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We can immediately see that this is a lot more complex than having lock-free code, but it did fix our problem, we
 | 
				
			||||||
 | 
					managed to correctly increase the reference count across multiple threads. The question is, at what cost?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					It takes a while to acquire or release a lock and these additional instructions slow down our code a lot, not to
 | 
				
			||||||
 | 
					mention that thread 2 was completely blocked while thread 1 had the lock and it was spending CPU cycles by sleeping and
 | 
				
			||||||
 | 
					waiting for the 1st thread to finish and release the lock. This is why threading can be quite complicated to deal with
 | 
				
			||||||
 | 
					and why some tasks should really stay single-threaded.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In this small example, it may be easy to understand what's going on, but if you add enough locks, it becomes
 | 
				
			||||||
 | 
					increasingly difficult to know whether there will be any "dead-locks" (this can happen when a thread acquires a lock
 | 
				
			||||||
 | 
					but never releases it, often the case if we forcefully kill a thread), to test your code, etc. Managing locks can
 | 
				
			||||||
 | 
					become a nightmare in a complex enough code-base.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Another problem about locks is, that they don't actually lock anything. Lock is essentially just a signal that can be
 | 
				
			||||||
 | 
					checked for and if it's active the thread can choose to wait until that signal is gone (the lock is released). But this
 | 
				
			||||||
 | 
					only happens if we actually check and if we decide to respect it, the threads are supposed to respect them, but there's
 | 
				
			||||||
 | 
					absolutely nothing preventing these threads from actually running anyway. If these threads forget to acquire a lock
 | 
				
			||||||
 | 
					they can do something that they shouldn't have been able to do. This means that even if we have a large code-base with
 | 
				
			||||||
 | 
					a lot of locks written correctly, it may not stay correct over time. Small adjustments to the code can cause it to
 | 
				
			||||||
 | 
					become incorrect in a way that's hard to see during code reviews.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Debugging multi-threaded code
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As an example, this is a multi-threaded code that will pass all tests and yet it is full of bugs:
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					import threading
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					counter = 0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					def foo():
 | 
				
			||||||
 | 
					    global counter
 | 
				
			||||||
 | 
					    counter += 1
 | 
				
			||||||
 | 
					    print(f"The count is {counter}")
 | 
				
			||||||
 | 
					    print("----------------------")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					print("Starting")
 | 
				
			||||||
 | 
					for _ in range(5):
 | 
				
			||||||
 | 
					    threading.Thread(target=foo).start()
 | 
				
			||||||
 | 
					print("Finished")
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					When you run this code, you will most likely get a result that you would expect, but it is possible that you could also
 | 
				
			||||||
 | 
					get a complete mess, it's just not very likely because the code runs very quickly. This means you can write code
 | 
				
			||||||
 | 
					multi-threaded code that will pass all tests and still fail in production, which is very dangerous.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To actually debug this code, we can use a technique called "fuzzing". With it, we essentially add a random sleep delay
 | 
				
			||||||
 | 
					behind every instruction to ensure that it is safe if a switch happens during that time. But even with this technique,
 | 
				
			||||||
 | 
					it is advised to run the code multiple times because there is a chance of getting the correct result even with this
 | 
				
			||||||
 | 
					method since it always is one of the possibilities, this is why multi-threaded code can introduce a lot of problems.
 | 
				
			||||||
 | 
					This would be the code with this "fuzzing" method applied:
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					import threading
 | 
				
			||||||
 | 
					import time
 | 
				
			||||||
 | 
					import random
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					def fuzz():
 | 
				
			||||||
 | 
					    time.sleep(random.random())
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					counter = 0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					def foo():
 | 
				
			||||||
 | 
					    global counter
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    fuzz()
 | 
				
			||||||
 | 
					    old_counter = counter
 | 
				
			||||||
 | 
					    fuzz()
 | 
				
			||||||
 | 
					    counter = old_counter + 1
 | 
				
			||||||
 | 
					    fuzz()
 | 
				
			||||||
 | 
					    print(f"The count is {counter}")
 | 
				
			||||||
 | 
					    fuzz()
 | 
				
			||||||
 | 
					    print("----------------------")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					print("Starting")
 | 
				
			||||||
 | 
					for _ in range(5):
 | 
				
			||||||
 | 
					    threading.Thread(target=foo).start()
 | 
				
			||||||
 | 
					print("Finished")
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					You may also notice that I didn't just add `fuzz()` call to every line, I've also split the line that incremented
 | 
				
			||||||
 | 
					counter into 2 lines, one that reads the counter and another one that actually increments it, this is because
 | 
				
			||||||
 | 
					internally, that's what would be happening it would just be hidden away, so to add a delay between these instructions
 | 
				
			||||||
 | 
					I had to actually split the code like this. This makes it almost impossible to test multi-threaded code, which is a big
 | 
				
			||||||
 | 
					problem.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					It is possible to fix this code with the use of locks, which would look like this:
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					import threading
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					counter_lock = threading.Lock()
 | 
				
			||||||
 | 
					printer_lock = threading.Lock()
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					counter = 0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					def foo():
 | 
				
			||||||
 | 
					    global counter
 | 
				
			||||||
 | 
					    with counter_lock:
 | 
				
			||||||
 | 
					        counter += 1
 | 
				
			||||||
 | 
					        with printer_lock:
 | 
				
			||||||
 | 
					            print(f"The count is {counter}")
 | 
				
			||||||
 | 
					            print("----------------------")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					with printer_lock:
 | 
				
			||||||
 | 
					    print("Starting")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					worker_threads = []
 | 
				
			||||||
 | 
					for _ in range(5):
 | 
				
			||||||
 | 
					    t = threading.Thread(target=foo)
 | 
				
			||||||
 | 
					    worker_threads.append(t)
 | 
				
			||||||
 | 
					    t.start()
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					for t in worker_threads:
 | 
				
			||||||
 | 
					    t.join()
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					with printer_lock:
 | 
				
			||||||
 | 
					    print("Finished")
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					As we can see, this code is a lot more complex than the previous one, it's not terrible, but you can probably imagine
 | 
				
			||||||
 | 
					that with a bigger codebase, this wouldn't be fun to manage.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Not to mention that there is a core issue with this code.  Even though the code will work and doesn't actually have any
 | 
				
			||||||
 | 
					bugs, it is still wrong. Why? When we use enough locks in our multi-threaded code, we may end up making it full
 | 
				
			||||||
 | 
					sequential, which is what happened here. Our code is running synchronously, with huge amount of overhead from the locks
 | 
				
			||||||
 | 
					that didn't need to be there and the actual code that would've been sufficient looks like this:
 | 
				
			||||||
 | 
					```py
 | 
				
			||||||
 | 
					counter = 0
 | 
				
			||||||
 | 
					print("Starting")
 | 
				
			||||||
 | 
					for _ in range(5)
 | 
				
			||||||
 | 
					    counter += 1
 | 
				
			||||||
 | 
					    print(f"The count is {counter}")
 | 
				
			||||||
 | 
					    print("----------------------")
 | 
				
			||||||
 | 
					print("Finished")
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					While in this particular case, it may be pretty obvious that there was no need to use threading at all, there are a lot
 | 
				
			||||||
 | 
					of cases in which it isn't as clear and I have seen some projects with code that could've been sequential but they were
 | 
				
			||||||
 | 
					already using threading for something else and so they made use of locks and added some other functionality, which made
 | 
				
			||||||
 | 
					the whole code completely sequential and they didn't even realize.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Global Interpreter Lock in Python
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As I said, this article is mainly based around the Python language, if you aren't interested in python, this part
 | 
				
			||||||
 | 
					likely won't be very relevant to you. However it is still pretty interesting to know how it works and why it isn't such
 | 
				
			||||||
 | 
					a huge issue as many claim it is. I also explain something about how threads are managed by the OS here which may be
 | 
				
			||||||
 | 
					interesting for you too.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Concurrency in python is a bit complicated because it has something called the "Global Interpreter Lock" (GIL), or at
 | 
				
			||||||
 | 
					least, that's what many people think, I actually quite like the GIL, this is what it does and why it actually isn't as
 | 
				
			||||||
 | 
					bad as many people think it is:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					GIL solves the problem of needing countless locks all across the standard library. These locks would force the threads
 | 
				
			||||||
 | 
					to wait for some other thread that currently has the lock acquired which is inevitable at some places, as explained in
 | 
				
			||||||
 | 
					the above section. Removing the global lock and introducing this many smaller locks isn't even that complicated, just
 | 
				
			||||||
 | 
					time-taking, the real problem about it is that acquiring and releasing locks is expensive and takes some time, so not
 | 
				
			||||||
 | 
					only does removing GIL introduce a lot of additional complexity of dealing locks all over the standard library, it also
 | 
				
			||||||
 | 
					makes python a lot slower.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					What's actually bad about GIL is the fact that it completely prevents 2 threads from being able to run parallel with
 | 
				
			||||||
 | 
					each other. 1 thread running at 1 core and 2nd thread running along the 1st one on another core. But this isn't as big
 | 
				
			||||||
 | 
					of an issue as it may sound. Even though we can't run more threads at once, i.e. there's no actual parallelism
 | 
				
			||||||
 | 
					involved, it doesn't prevent concurrency. Instead, these threads are constantly being switched around first we're at
 | 
				
			||||||
 | 
					thread 1, then thread 2, then thread 1, then back to thread 2, etc. The lock is constantly moving from one thread to
 | 
				
			||||||
 | 
					another.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					But this interchanging of threads is happening in languages without any interpreter-wide lock. Every machine will have
 | 
				
			||||||
 | 
					limited amount of cores/CPUs at it's disposal and it is actually up to the OS itself to manage when a thread is
 | 
				
			||||||
 | 
					scheduled to run. The OS needs to determine the importance of each process and it's threads and decide which should run
 | 
				
			||||||
 | 
					and when. Sometimes it may happen that the OS will schedule 2 threads of the same process at once to be ran, which
 | 
				
			||||||
 | 
					wouldn't be possible with python due to GIL, but if other processes occupy the cores, every other thread on the system
 | 
				
			||||||
 | 
					is paused and waiting for the OS to start it again. This switching between the threads itself can happen at any
 | 
				
			||||||
 | 
					arbitrary instruction and we don't have control over it anyway.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					So when would it make sense to even use threads if they can't run in parallel? Even though we don't have control over
 | 
				
			||||||
 | 
					when these switches happen, we do have control over when the GIL is passed, and the OS is clever enough to not schedule
 | 
				
			||||||
 | 
					in a thread that is currently waiting to acquire a lock, it will schedule to active thread that is actually doing
 | 
				
			||||||
 | 
					something. The advantage of threads is just that they can cleverly take turns to speed up the overall process. Say
 | 
				
			||||||
 | 
					you have a `time.sleep(10)` operation in one thread, we can pass the GIL over to another thread, that isn't currently
 | 
				
			||||||
 | 
					waiting and constantly check if the first thread is done yet, once it is, we can switch around between them at
 | 
				
			||||||
 | 
					arbitrary order, until again it makes more sense to run one thread over another, such as when a thread is sleeping.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Threads vs Asynchronous run
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As I explained in the last paragraph of the previous section about GIL, threads are always being interchanged for us,
 | 
				
			||||||
 | 
					we don't need any code that explicitly causes this switching, which is an advantage of threading. This interchanging
 | 
				
			||||||
 | 
					allows for some speed-ups and we don't need to worry about the switching ourselves at all!
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					But the cost to this convenience is that you have to assume a switch can happen at any time, this means we can hop over
 | 
				
			||||||
 | 
					to another thread after the first one finished reading data from memory, but it didn't yet store them back. This is why
 | 
				
			||||||
 | 
					we need locks. Threads switch preemptively, the system decides for us.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The limit on threads is the total CPU power we have minus the cost of task switches and synchronization overhead
 | 
				
			||||||
 | 
					(locks).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					With asynchronous processing, we switch cooperatively, i.e. we use explicit code (`await` keyword in python) to cause a
 | 
				
			||||||
 | 
					task switch manually. This means that locks and other synchronization is no longer necessary. (In practice we actually
 | 
				
			||||||
 | 
					do still have locks even in async code, but they're much less common and many people don't even know about them because
 | 
				
			||||||
 | 
					they're simply not necessary in most cases)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					With python's asyncio, the cost of task switches is incredibly low, because they internally use generators (awaitables)
 | 
				
			||||||
 | 
					and it is much quicker to restart a generator that stores it's all of it's state, than calling a pure python function
 | 
				
			||||||
 | 
					which has to build up a whole new stack frame on every call whereas a generator already has a stack frame and picks up
 | 
				
			||||||
 | 
					where it left off, this makes asyncio task switching the cheapest way to handle task-switching in python by far. In
 | 
				
			||||||
 | 
					comparison, you can run hundreds of threads, but tens of thousands of async tasks per second.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This makes async easier to get done than threads, and much faster and lighter-weight in comparison.
 | 
				
			||||||
 | 
					But nothing can be perfect, and async has it's downside too, one downside is that we have to perform the switches
 | 
				
			||||||
 | 
					cooperatively, so we need to add the `await` keyword to our code, but that's not very hard. The much more relevant
 | 
				
			||||||
 | 
					downside is that everything we now do has to be non-blocking. We can no longer simply read from a file, we need to
 | 
				
			||||||
 | 
					launch a task to read from a file, let it start reading and when the data is available, go back and pick it up. This
 | 
				
			||||||
 | 
					means we can't even use regular `time.sleep` anymore, instead, we need it's async alternative `await asyncio.sleep`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This means that we need a huge ecosystem of support tools that adds the support for asynchronous alternatives to every
 | 
				
			||||||
 | 
					blocking synchronous operation, which increases the learning curve.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Comparison
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Async maximizes CPU utilization because it has less overhead than threads
 | 
				
			||||||
 | 
					- Threading typically works with existing code and tools as long as locks are added around critical sections
 | 
				
			||||||
 | 
					- For complex systems, async is much easier to get right than threads with locks
 | 
				
			||||||
 | 
					- Threads require very little tooling (locks and queues)
 | 
				
			||||||
 | 
					- Async needs a lot of tooling (futures, event loops, non-blocking versions of everything)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Conclusion
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- If you need to run something in parallel, you will need to use multiprocessing because GIL prevents parallel threads
 | 
				
			||||||
 | 
					- If you need to run something concurrently, but not necessarily in parallel, you can either use threads or async
 | 
				
			||||||
 | 
					- Threads make more sense if you already have a huge code-base because they don't require rewriting everything to
 | 
				
			||||||
 | 
					  non-blocking versions you will just need to add some locks and queues
 | 
				
			||||||
 | 
					- Async make more sense if you know you will need concurrency from the start, since it helps to keep everything a lot
 | 
				
			||||||
 | 
					  more manageable and it's quicker than threads.
 | 
				
			||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue