Today we’re going to go over the concept of relational databases—what they are and how they work. We will also take a look at structured query languages and how they are used to interact with relational databases, including presenting and going over a common sample interview question you may come across.
By the end of this article you should be comfortable explaining how a relational database works at a high level, as well as understand some of the basic parts that make up SQL code. If your knowledge of SQL is limited or non-existent, this information serves as a great starting point if you want to do well in an interview for a role requiring the use of SQL.
This article is part of a group of articles relating to interview prep. For more tips on how to ace the SQL portion of a technical interview, click here.
Question 1: What is a relational database?
The formal definition of relational database is:
“A database structured to recognize relations among stored items of information.”
This is a great definition if you already know how relational databases work. However, more generally it’s not a great definition because it doesn’t really tell you any more than you already knew before you looked it up.
However, the answer is in the definition—it just needs to be teased out a bit. Let’s deconstruct the sentence and ask a couple more questions in order to understand the definition a little bit better:
- What is a database?
A database, is essentially a computer which stores HUGE amounts of information (data), organized in a way that makes it easy for you to find the information you need as quickly and efficiently as possible. There are several different types of databases, but the one we’re discussing today is the relational database.
- What does it mean for a database to be “relational”?
A relational database is made up of tables. You can think of a table as a single excel spreadsheet with a data set on it. Databases are often made up of hundreds, even thousands of these tables, each containing a specific set of information.
Now, let’s say your boss comes in and says to you “@analysthour, I need a file with our sales for all of New York City ASAP!”
In your database you have two tables—one with client information, including a column that says what city and state each client lives in. The other has sales information by client. How would you put them together?
You may have already noticed that the tables have one thing in common with each other: clients. What if we could link the two tables together by matching the name of a client in the first table with the same name in the second table—essentially joining both tables together into one big table?
The process described above is what makes a relational database “relational”—it gives you the data you want by relating one table to another based on a shared trait (or “key”) present in both tables—in this case the shared trait is the client’s name.
So let’s recap what we know now in one big (memorize-able) statement:
“A relational database is a computer that stores a HUGE amount of information, organized into many smaller tables. What makes a database relational is that the tables can be linked together based on shared columns, or keys, in each table. (i.e. a client ID number) This structure allows the user to find and connect pieces of data to each other quickly and efficiently.”
A little bit longer, sure—but much more thorough than what we started with.
Pro Tip: If you’re trying to land a job as an analyst, even if you don’t know SQL, take the time to understand and memorize this basic structure. It will show the hiring manager that you’ve done some research, and it will also make it easier to actually learn SQL once you get a job because you’ll have a rough idea of what’s going on under the hood when you write your first “select” statement.
Question 2: What is SQL and how is it related to what we just discussed?
SQL is an acronym—the letters stand for “Structured Query Language”
There are many query languages out there today, but SQL is one of the most common and widely used.
It’s important to note that there are a few different versions of SQL, and that which version of SQL you use largely depends on which company designed and manufactured the database you’re using, but that’s a different topic for a different day—from here on out we will just use SQL as a blanket term to describe a structured query language.
SQL is a programming language designed to interact with and retrieve information from a relational database. A “SQL query” is basically a question posed to your database in a language it can understand.
When I say:
“select * from database_a.table_b”
The database understands it as:
“Please give me everything in table ‘b’, which is located in database ‘a’.”
There are hundreds of functions and tools you can use to manipulate data in SQL. On a basic level though, there are four main parts to a query which you should be familiar with:
1. The “select” statement
The select statement always comes first in a query. It’s the part of the query where you tell the database exactly what you want the dataset to look like. For example, if you had a table in your database—let’s call it table_a—with columns client_id, client_name, address, city, state, and sales, but you just needed city and sales for the project you’re working on, your select statement might look like this:
2. The “from” statement
The from statement is the second part of a query. This is the part of the query where you tell the database where to look to find the data you want. Specifically, you’re telling the database that “the data I need is in table_a” The query now looks like this:
3. The “where” clause
The where clause is one of the tools you can use to filter your data. If you remember the example from earlier in the article, your hypothetical boss asked for sales for New York City. The where clause allows you to filter based on the values in a particular column. If you were to select the sales for just the states in New York City, your where clause would have to look like this:
Notice that we specified both the state and the city in our where clause. This is not necessary, we could also write the where clause like this:
The reason we included state is to show you that a column in a given table does not have to be in the select statement for you to filter on it–it just has to be present in the base table (table_a). This is a neat trick, and one which could easily trip you up in an interview if you’re not aware of it. By the way, our query now looks like this:
4. The group by clause
The group by clause allows you group like terms together. This feature is a little bit more nuanced, but basically think of it as a method of aggregation. When you use a group by clause, the following happens:
You need to group by city, because the data in the table you’re querying actually looks like this:
If you were to omit the group by clause, the database would return one row for every different sales number rather than rolling them all up into one top-line sales number. You may think it’s a mistake, but without the group by clause the database thinks it’s supposed to return this:
Group by allows you to throw out the invisible portion of the image above and add all of the sales numbers together. Remember, if you want one sales number per city you need to group by city, and that looks like this:
Now we have the four major parts that form a basic query similar to something you may see in an interview, and the entire query together looks like this:
It’s important to remember that this is a contrived example meant to explain the key parts of a SQL query—a SQL interview question may be more complex than this.
Remember, the purpose of this exercise is not to convince someone that you know SQL, and if your knowledge of SQL is limited, you shouldn’t be pretending to know SQL in an interview anyway. (See the related article on acing a SQL question in a technical interview here).
This article is meant to serve as a base for your understanding of relational databases and structured query languages, which will give you a leg up against most people who have put “SQL” under “Technical Skills” on their resume because they used it once for a project or just because their business school told them it’s important.
If you already know a bit of SQL you may notice that we didn’t make any mention of joins—the syntax used to link two tables together. This is because joins are complex enough to deserve their own article, and we’ll talk about them separately here.
Did you find this useful? Please leave a comment down below!