itsdrike.com/content/posts/interpreted-vs-compiled.md

10 KiB

title date tags
Interpreted vs Compiled Languages 2021-09-09
programming

You've likely seen or heard someone talking about a language as being interpreted or as being a compiled language, but what's the difference? When should you pick one or the other when deciding which language to use for your projects and does it even matter? I'll do my best to explain how they differ and the advantages of either.

History

At first, I want to explain how the languages evolved and what were the steps to even getting to something like a compiled languages. How did we write the first programs when we didn't have any compiler, not to mention an interpreter.

First programs

In the beginning of the computational era, we had to use what's called a machine code, that was fed into the CPU. It contained the individual precise instructions telling the CPU what it should be doing. Each instruction meant some operation, for example there is an instruction for adding numbers together, or subtracting them, so on and so on.

While this is amazing for the CPU, humans can't intuitively write a program in a language like this, which is why programming was very difficult in the early days. Programmers had to first work out the whole process of what the program will be doing, and only then even start to think about how can they implement it purely with these instructions. And after that was done, they had to look up the binary representations for each of the instructions they've used and manually convert their write-up of these individual instructions into a sequence of binary data that was then fed into the computer and CPU was able to execute these individual instructions.

A more modern approach

Obviously, we wanted to make things easier for ourselves by automating and abstracting as many things in the programming process as we could, so that the programmer could really only focus on the algorithm itself and the actual conversion of the code to a machine code that the CPU can deal with should simply happen in the background.

But how could we achieve something like this? The simple answer is, to write something that will be able to take our code and convert it into a set of instructions that the CPU will be running. This intermediate piece of code can either be a compiler, or an interpreter.

After a piece of software like this was first written, suddenly everything became much easier, initially we were using assembly languages that were very similar to the machine code, but they looked a bit more readable, while the programmer still had to think in the terms of what instruction should the CPU get in order to get it to do the required thing, the actual process of converting this code into a machine code was done automatically. The programmer just took the text of the program in the assembly language, fed it into the computer and it returned a whole machine code.

Later on we went deeper and deeper and eventually got to a very famous language called C. This language was incredibly versatile and allowed the programmers to finally start thinking in a bit more natural way about how should the program be written and the tedious logic of converting this textual C implementation into something executable was left to the compiler to deal with.

Recap

So we now now that initially, we didn't have neither the compiler nor an interpreted and we simply wrote things in machine code. But this was very tedious and time taking, so we wrote a piece of software (in machine code) that was able to take our more high-level (english-like) text and convert that into this executable machine code.

Compiled languages

All code that's written with a language that's supposed to be compiled has a piece of software called the "compiler". This piece of software is what carries the internal logic of how to convert these instructions that followed some language-specific syntax rules into the actual machine code, giving that us back the actual machine code.

This means that once we write our code in a compiled language, we can use the compiler to get an executable version of our program which we can then distribute to others.

However this executable version will be specific to certain CPU architecture and if someone with a different architecture were to obtain it, he wouldn't be able to run it. (At least not without emulation, which is a process of simulating a different CPU architecture and running the corresponding instruction that the simulated CPU gets with an equivalent instructions on the other architecture, even though this is a possibility, the process of emulation causes significant slowdowns, and because we only got the executable machine code rather than the actual source-code, we can't re-compile the program our-selves so that it would run natively for our architecture)

Some of the most famous compiled languages are: C, C++, Rust

Interpreted Languages

Similarly to how code for compiled languages needs a compiler, interpreted languages need an interpreter. But there is a major difference between these 2 implementations.

With interpreted languages, rather than feeding the whole source code into a compiler and getting a machine code out of it that we can run directly, we instead feed the code into an interpreter, which is a piece of software that's already compiled (or is also interpreted and other interpreter handles it) and this interpreter scans the code and goes line by line and interprets each function/instruction that's than ran from within the interpreter. We can think of it as a huge switch statement with all of the possible instructions in an interpreted language defined in it. Once we hit some instruction, the code from inside of that switch statement is executed.

This means that with an interpreted language, we don't have any final result that is an executable file that can be distributed alone, but rather we simply ship the code itself. However this brings it's own problems such as the fact that each machine that would want to run such code have to have the interpreter installed, in order to "interpret" the instructions written in that language. We also sometimes call these "scripting languages"

Some of the most famous interpreted languages are: PHP, JavaScript

Core differences

As mentioned, with a compiled language, the source code is private and we can simply only distribute the compiled version, whereas with an interpreted language this is completely impossible since we need the source code because that's what's actually being ran by the interpreter, instruction after instruction. The best we can do if we really wanted to hide the source-code with an interpreted language is to obfuscate it.

Compiled languages will also have a speed benefit to them, because they don't need additional program to interpret the instruction within that language, but rather needs to go through an additional step of identifying the instructions and running the code for them. Compilers often also perform certain optimizations, for example with code that would always result in a same thing, something like a = 10 * 120, we could compile it and only store the result 1200, running the actual equation at compile time, making the running time faster.

So far it would look like compiled languages are a lot better than interpreted, but they do have many disadvantages to them as well. One of which I've already mention, that is not being cross-platform. Once a program is compiled, it will only ever run on the platform it was compiled for. If the compilation was happening directly to a machine code, this would mean it would be architecture-specific.

But we usually don't do this and rather compile for something kernel-specific, because we are running under some specific operating system that uses some kernel. The kernel is basically acting as a big interpreter for every single program. We do this because this means that we can implement some security measures that for example disallow untrusted programs to read or write to a memory location that doesn't belong to that program.

This alone is a pretty big disadvantage, because we will need to compile our program for every operating system it is expected to be ran on. In the case of an interpreted language, all we need to do is have the actual interpreter to be executable on all platforms, but the individual programs made with that language can then be ran on any platform, as long as it has the interpreter.

From this example, we can also see another reason why we may want to use an interpreter, that is the kernel itself, with it we can implement these security measures and somewhat restrict parts of what can be done, this is very crucial to having a secure operating system.

Another advantage of interpreted languages is the simple fact that they don't need to be compiled, it's one less step in the process of distributing the application and it also means that it's much easier to write automated tests for, and for debugging in general.

Hybrid languages

You may notice that I haven't included my favorite language, which is Python in the most famous interpreted languages section, and I had a good reason not to, which is that contrary to popular belief, python actually isn't an interpreted language, well at least not entirely.

There is another type of doing this, that's somewhere in the middle. Instead of the compile model where all the work is done up-front, that's a bit inflexible, or the interpreted model, where all work is done on the receiving end, but is a bit slower, we kind of combine things and do both.

Up-front, we compile it partially, to what's known as byte-code, or intermediate language. This takes things as close to being compiled as we can, while still being portable across many platforms, and we then distribute this byte-code rather than the full source-code and each person who runs it does the last step of taking it to machine code by running this byte-code with an interpreter. This is also known as Just In Time (JIT) compilation.

Most languages tend to only be one or the other, but there are a fair few that follow this hybrid implementation, most notably, these are: Java, C#, Python, VB.NET