Interview Questions on SQL
Introduction:
SQL is an important query language used with relational databases. Having a good hold on the basics of the language could help you ace a data science/ engineer interview easily.
As you have clicked on this article, I assume that you are preparing for a technical interview on SQL. Worry not, this article should help you brush up on your skills in no time by deep diving and looking into some of the questions I received for SQL when I interviewed for multiple data-related roles.
The questions in this article would cover the topics of Joins, Partitions, Indexes, Sequence of query execution, Aggregate functions, the Group By clause, and so on.
Interview Questions:
1) Can you list and explain all SQL joins?
You could answer this question by giving the interviewer an example of two tables A and B such that A has a foreign key referencing B.
- Inner Join- An inner join would provide us with the rows which are common in both A and B
- Left Outer Join:- In the left outer join, we would return every row from A and only the rows in B that are present in A
- Right Outer Join:- Right outer join would be the opposite of the left outer join, that is we would return every row from B and only those rows from A which are present in B.
- Full Outer Join:- Every row (matching and non-matching) in both the tables
- Cross Join:- This is more of a cartesian join where each record from table A is joined to each record of table B and vice versa.
2) Let us take an example of two tables namely Employee and Department which have the below rows, Id column is a key in the Department table; Emp_id is a key in the Employee table while Dept_id is a foreign key referencing the Department table. What is the number of rows which are present in the result when we join the two tables for each of the SQL joins you mentioned above with the query you would use to fetch the results?
Employee:-
.............................
: Emp_id : Dept_id : Salary :
:........:.........:........:
: E1 : D1 : 100 :
: E2 : D2 : 200 :
: E3 : D2 : 50 :
: E4 : NULL : 75 :
:........:.........:........:
Dep:-
......
: Id :
:....:
: D1 :
: D2 :
: D3 :
:....:
The inner join would return 3 rows and the resultant table after join would look like the below:
..................................
: Emp_id : Dept_id : Salary : Id :
:........:.........:........:....:
: E1 : D1 : 100 : D1 :
: E2 : D2 : 200 : D2 :
: E3 : D2 : 50 : D2 :
:........:.........:........:....:
The SQL query would be:-
SELECT *
FROM Employee e INNER JOIN Dep
ON e.Dept_id = Id
The left outer join would return 4 rows and the resultant table after join would look like the below:
....................................
: Emp_id : Dept_id : Salary : Id :
:........:.........:........:......:
: E1 : D1 : 100 : D1 :
: E2 : D2 : 200 : D2 :
: E3 : D2 : 50 : D2 :
: E4 : NULL : 75 : NULL :
:........:.........:........:......:
The SQL query would be:-
SELECT *
FROM Employee e LEFT JOIN Dep
ON e.Dept_id = Id
The right outer join would return 4 rows and the resultant table after join would look like the below:
..................................
: Emp_id : Dept_id : Salary : Id :
:........:.........:........:....:
: E1 : D1 : 100 : D1 :
: E2 : D2 : 200 : D2 :
: E3 : D2 : 50 : D2 :
: NULL : NULL : NULL : D3 :
:........:.........:........:....:
The SQL query would be:-
SELECT *
FROM Employee e RIGHT JOIN Dep
ON e.Dept_id = Id
The full outer join would return 5 rows and the resultant table after join would look like the below:
....................................
: Emp_id : Dept_id : Salary : Id :
:........:.........:........:......:
: E1 : D1 : 100 : D1 :
: E2 : D2 : 200 : D2 :
: E3 : D2 : 50 : D2 :
: NULL : NULL : NULL : D3 :
: E4 : NULL : 75 : NULL :
:........:.........:........:......:
The SQL query would be:-
SELECT *
FROM Employee e FULL OUTER JOIN Dep
ON e.Dept_id = Id
3) What are some ways to improve the performance in accessing/storing data?
For storing data efficiently, partitions can be created. Each partition can be stored separately at different locations on a disk, instead of requiring a single large chunk of space. You would have to define a set of columns on which you would want to partition the data. Say you receive data monthly, you would then partition your data on the date column and create multiple subsets of the table based on the different months.
The second way to improve efficiency in access of data would be the use of indexes. Usually, you would want to create an index of a column that is frequently used in joins or other DDL statements. There are two types of indices in Oracle clustered and non-clustered index. Clustered indices physically sort the data, the primary key would be an example of a clustered index. You can only have one clustered index as the rows in a table can be sorted physically only in one way. Non-clustered indices do not physically sort the data, they are synonymous with the index of a book, that is they are stored separately and you can have multiple non-clustered indexes in one table.
4) Question on Aggregate functions:
Consider a table Employee_details. Give a SQL query that would provide the count of non-unique phone numbers in the table along with the customer’s name.
.................................
: Name : Address : Phone Number :
:......:.........:..............:
: A : Mumbai : 12345 :
: B : Delhi : 12121 :
: C : Mumbai : 2323 :
: D : Delhi : 12345 :
:......:.........:..............:
SQL Query:-
SELECT COUNT(*), Name
FROM Employee_Details
WHERE NOT IN (SELECT DISTINCT(Phone Number)
FROM Employee_Details)
5) When would you use a stored procedure?
In case of a query or a set of queries that are to be executed multiple times, we use a stored procedure. It is similar to the use of functions in other programming languages. You define a set of executable statements once and reference them every time you need to execute the same task.
Additionally, you can define a stored procedure as follows:
CREATE PROCEDURE SPGetEmpl
AS
SELECT *
FROM Employee
GROUP BY Dept_id
HAVING Salary > 50
GO
6) What is the sequence of execution employed for any SQL statement?
Firstly, all data is fetched from the FROM/ JOIN clause, which is filtered using the WHERE clause, in case a GROUP BY clause is present it is then applied followed by the HAVING clause, post which the ORDER BY/LIMIT clauses would be executed if applicable followed by the return of data using the SELECT clause.
If I give you an example with a query,
SELECT *
FROM Employee e RIGHT JOIN Dep
ON e.Dept_id = Id
WHERE Id is NOT NULL
GROUP BY Id
HAVING Salary>25
ORDER BY Salary DESC;
In the above query, all data would be fetched from Employee and Dep tables based on the join condition, post which only the records having the Id field as not NULL would be filtered, after which groups would be created based on the department id which has a salary greater than 25, Finally, the data would be sorted based on descending order of the Salary and all columns would be displayed as we are using SELECT *.
Conclusion:
Knowing these questions would definitely give you an edge while interviewing. The most important thing would be to have your basics in SQL cleared, and I have tried to do the same by including questions from different SQL topics in this article.
Some key takeaways would be:
1) We have discussed some questions about joins, this is the most important concept in databases, and I would suggest you have an in-depth understanding of the same, try solving multiple questions with different tables as inputs.
2) Knowing optimization strategies like indices and partitions could help you to showcase your practical understanding of databases and surely give you an edge.
3) Aggregate functions like count, sum, min, and max have been asked in every interview I have given, it would be worth your time to practice these queries along with the group by clause.
About the Author:
Hey, I am Alifia Ghantiwala, a Software Engineer. When I am not working I enjoy telling stories with data and writing to help people with my experiences. Thanks for reading the article, have an awesome day, and all the very best for your interview!