2021-09-20 19:56:53 +00:00
|
|
|
---
|
|
|
|
title: JSON vs Databases
|
|
|
|
date: 2021-09-20
|
|
|
|
tags: [programming]
|
|
|
|
---
|
|
|
|
|
|
|
|
I've seen tons of projects incorrectly use a method of storing data for their use-case. In most of the cases this was
|
|
|
|
an issue about using JSON instead of a database, but I did also see some people using databases where JSON should've
|
|
|
|
been used, or even some completely different format, such as simple plain text, or something more similar to JSON, such
|
|
|
|
as YAML. This is why I decided to write something about which of these should you use and when.
|
|
|
|
|
|
|
|
## What is the JSON format?
|
|
|
|
|
|
|
|
To understand when should we use JSON and when we shouldn't, let's first understand what JSON was actually made for and
|
|
|
|
how is it commonly used now.
|
|
|
|
|
|
|
|
The name **JSON** stands for JavaScript Object Notation. This is precisely what this format was made for, to hold
|
|
|
|
objects from JavaScript language. Even though this was the reason this format was made though, it's certainly not the
|
|
|
|
only thing it's currently being used for and we see JSON in countless other languages and use-cases then just in
|
|
|
|
JavaScript to hold it's objects.
|
|
|
|
|
|
|
|
But why is it that if this format was made only for JS, we're still using it in so many other places? The answer to
|
|
|
|
that is really simple, many data is able to be fully represented in JSON-like structure, even though that's not what it
|
|
|
|
was made for, if it fits the format, it can be used. For example any hashmap (or dictionary in python) would certainly
|
|
|
|
be following the key-value structure, which is what JSON is holding, any array will be able to be represented as comma
|
|
|
|
separated values, any strings can just be represented as the text wrapped in double (or single) quotes, so on and so
|
|
|
|
on. Because JSON format supports all of these features, it turned out that a huge majority of the data we need can be
|
|
|
|
completely represented by it without any issues.
|
|
|
|
|
|
|
|
## What is a database?
|
|
|
|
|
|
|
|
This is a bit harder to explain, but the most basic way to think about this is to essentially think of an excel
|
|
|
|
spreadsheet. It has some columns and some rows and you can store data into it. There are countless Database Management
|
|
|
|
Systems (DBMS) out there, most notably these are MariaDB, MySQL, PostgreSQL, Oracle, MongoDB, ... The goal of these
|
|
|
|
systems is, as the name would imply, manage the database. It controls how the data is stored and retrieved and perhaps
|
|
|
|
if there should be some internal compression of this database or things like that.
|
|
|
|
|
|
|
|
Even though there are a lot of choices for DBMS, no matter which one we end up using, on the surface, they will be
|
|
|
|
doing exactly the same thing. Storing tables of data. Each item in the database usually has some *primary key*, which
|
|
|
|
is a unique identifier for given column of data. We can also have composite primary keys, where there would be multiple
|
|
|
|
slots which are unique when combined, but don't necessarily need to be unique on their own. We can also use what's
|
|
|
|
called a *foreign key*, which is basically the primary key of something in another database table, separate from the
|
|
|
|
current one, to avoid data repetition. This would be an example of the data tables that a database could hold:
|
|
|
|
|
|
|
|
Table of students:
|
|
|
|
|
2022-05-15 11:11:03 +00:00
|
|
|
{{< table >}}
|
2021-09-20 19:56:53 +00:00
|
|
|
| Student ID | Date of birth | Name | Permanent residence address | Field of study |
|
|
|
|
|------------|---------------|----------------|-----------------------------|----------------------|
|
|
|
|
| 0 | 1999-02-22 | John Doe | Leander, Texas(TX), 78641 | Software Engineering |
|
|
|
|
| 1 | 1996-10-02 | Jack Hill | Denmark, Maine(ME), 04022 | Computer Science |
|
|
|
|
| 2 | 2000-11-14 | Samantha Jones | Dayton, Kentucky(KY), 41074 | Graphics Design |
|
|
|
|
| 3 | 1998-04-12 | Michael Carter | Macomb, Michigan(MI), 48044 | Software Engineering |
|
|
|
|
| ... | ... | ... | ... | ... |
|
2022-05-15 11:11:03 +00:00
|
|
|
{{< /table >}}
|
2021-09-20 19:56:53 +00:00
|
|
|
|
|
|
|
Student Grades:
|
|
|
|
|
2022-05-15 11:11:03 +00:00
|
|
|
{{< table >}}
|
2021-09-20 19:56:53 +00:00
|
|
|
| Student | Subject | Grade | Year |
|
2021-09-20 19:58:12 +00:00
|
|
|
|---------|-------------------|-------|------|
|
2021-09-20 19:56:53 +00:00
|
|
|
| 0 | Mathematics | B | 2020 |
|
|
|
|
| 0 | Physics | A | 2020 |
|
|
|
|
| 1 | Computer Networks | C | 2021 |
|
|
|
|
| 2 | Mathematics | D | 2021 |
|
|
|
|
| 2 | Web Design | A | 2021 |
|
|
|
|
| 2 | Web Design | B | 2020 |
|
|
|
|
| ... | ... | ... | ... |
|
2022-05-15 11:11:03 +00:00
|
|
|
{{< /table >}}
|
2021-09-20 19:56:53 +00:00
|
|
|
|
|
|
|
Here we can see that the *Student Grades* table doesn't have a standalone unique primary key, like the Students table
|
|
|
|
has, but rather it has a composite primary key, in this case, it's made of 3 columns: *Student*, *Subject* and *Year*.
|
|
|
|
We can also see that rather than defining the whole student all over again, since we already have the students table,
|
|
|
|
we can instead simply use the Student ID from which we can look up the given student from this table
|
|
|
|
|
|
|
|
I've probably went a bit over the board about databases here that I needed to, but the most important thing about them
|
|
|
|
is that the DBMS will in most cases make separate indices that make accessing or searching something in a database very
|
|
|
|
quick. For example a DBMS might make an index of the dates of birth for our student table, that is sorted and only
|
|
|
|
contains 2 columns, the primary key (student id) and the date of birth. With this index, we can then perform a binary
|
|
|
|
search when searching for the needed date of birth, from which we will then get our student ID very quickly, and in the
|
|
|
|
main database, there will also be an index that's ordered according to the student ids, so with another binary search,
|
|
|
|
we can immediately find the data about a student with a certain date of birth.
|
|
|
|
|
|
|
|
## Database Advantages
|
|
|
|
|
|
|
|
Because of the optimizations database is doing such as building separate index files that make it really efficient, a
|
|
|
|
database is an ideal solution whenever we know we will be holding a large amount of data. This is because whenever we
|
|
|
|
will need to search for something specific in the database, we can do so very quickly. This doesn't just mean a slight
|
|
|
|
improvement, with a database, you can easily get 1000x better performance than you ever could with a JSON file, because
|
|
|
|
with JSON we first need to parse out the whole JSON file, and only then we can access anything from it, and that is
|
|
|
|
without any of these helper indices.
|
|
|
|
|
|
|
|
That said, there are some ways to speed up the JSON lookups, these are done by avoiding to parse out the whole JSON
|
|
|
|
file and rather perform a simple search within that file for given term, and extrapolating from that. However this
|
|
|
|
isn't very reliable and even with something like this, we actually still wouldn't achieve the same performance as we
|
|
|
|
could with a database. This still has many reasons, not only is the database model still much faster with the binary
|
|
|
|
search from indices, it is also running as a service, which means that it can have certain data always loaded, and
|
|
|
|
ready to be returned when they're asked for, rather than having to open up a file and perform a search in it.
|
|
|
|
|
|
|
|
Another disadvantage of the JSON model is that when it is parsed, it means fully loading the whole JSON file into
|
|
|
|
memory (RAM). This works well for small things, that only hold a couple hundreds of entries, but once the file starts
|
|
|
|
to grow, we need to have the RAM capacity to accommodate for that growth.
|
|
|
|
|
|
|
|
If that all wasn't enough, there is yet another reason not to use JSON, that is additional writes. When we want to
|
|
|
|
extend a JSON file and write something new into it, we can't do that by simply appending something to the end of the
|
|
|
|
file, because JSON structure simply doesn't allow anything like that. Instead we first need to parse and load the whole
|
|
|
|
structure into memory, then edit that structure and add something we need to add, and then re-write the whole file once
|
|
|
|
again. This is extremely inefficient and so in any scenario where speed matters, JSON isn't a good solution.
|
|
|
|
|
|
|
|
## JSON Advantages
|
|
|
|
|
|
|
|
Alright, so now we know many reasons to avoid JSON, but when should JSON actually be used then, from all of this, it
|
|
|
|
seems like databases are better in every case, or is there some disadvantage to them too? Well, in most cases, the
|
|
|
|
appeal of JSON is the fact that it's text-based. With a database, even though you will usually be able to export it
|
|
|
|
into other formats, it is generally stored as a binary file, that can't really be easily edited and you will need the
|
|
|
|
corresponding DBMS to actually make any sense of it. This benefit of a plain-text format is great for things such as
|
|
|
|
API responses (or requests) and in fact, JSON is by far the most commonly used format for this.
|
|
|
|
|
|
|
|
What JSON was made for and that it can be used to represent many things, but where is it actually used? Well, the most
|
|
|
|
common use-case for this format are certainly API responses (or requests). The body of a response given from an API
|
|
|
|
will usually follow the JSON format, if it holds any data that's not a single value. While there are APIs that do use a
|
|
|
|
different standard, JSON is by far the most popular one. As can be seen in this use-case and from the reason it was
|
|
|
|
made for, JSON is really good for representing objects in programming languages. Whenever we have some data that was
|
|
|
|
already obtained from a database, or otherwise generated or stored, we can use JSON to easily represent it as text
|
|
|
|
without needing to resort to some language-specific serialization (such as pickling in python). So essentially, JSON is
|
|
|
|
a language-agnostic format for transmitting data about objects within a language.
|
|
|
|
|
|
|
|
Another useful case for using JSON is to represent data with it when the data is expected to be read by the user. These
|
|
|
|
are things like configuration files that can be directly changed by the user, however with this use-case, it may be
|
|
|
|
better to use .ini config format, or something like YAML or TOML. These formats are a bit more commonly used for config
|
|
|
|
files than JSON, and should be preferred, but JSON isn't necessarily a bad option either.
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
For smaller projects, where we only really keep ~50 entries in the JSON file, it may be alright to use JSON, however if
|
|
|
|
this config is expected to be read from often, database should still be preferred. Really for any data that's
|
|
|
|
constantly changing a database would be preferred, however for static configuration or any static data that's not
|
|
|
|
really expected to be changed by the program itself, or at least not commonly changed by it, and is only read once
|
|
|
|
(usually at the start of the program) to obtain the data and not touched afterwards, JSON or similar formats are a
|
|
|
|
perfectly reasonable way of storing these data. In fact it would be a mistake to use a database for data that will only
|
|
|
|
be read once and won't be touched again, especially if the user could benefit from editing or seeing these data.
|
|
|
|
|
|
|
|
However for anything where we need to update some values constantly and re-access them later, we should always prefer a
|
|
|
|
database from using simple plain-text files. It will be a lot more efficient and you'll see a huge speed increase in
|
|
|
|
your program if you decide to switch to a database and speed matters to you.
|
|
|
|
|
|
|
|
Another use-case for databases is when you need to host the data of the database on some other machine. With a
|
|
|
|
database, we can simply expose some port and let the DBMS handle interactions with it when our program can simply be
|
|
|
|
making requests to this remote server. This is usually how we handle using databases with servers, but many client-side
|
|
|
|
programs are creating their own local databases and using those, simply because using files is ineffective.
|
|
|
|
|