5 Common SQL Interview Problems for Data Scientists

Helping you develop your SQL skills to ace any interview:

While it’s not the sexiest part of the job, having a strong understanding of SQL is essential to succeed in any data-focused job. The truth is that there’s way more to SQL than SELECT FROM WHERE GROUP BY ORDER BY. The more functions you know, the easier it’ll be for you to manipulate and query anything you want.
There are two things I hope to learn and communicate in this article:
  1. Learn and teach SQL functions beyond the basic fundamentals
  2. Go through a number of SQL interview practice problems
These questions are taken from none other than Leetcode! Go check it out if you haven’t yet!

Table of Content

PROBLEM #1: Second Highest Salary

Write a SQL query to get the second highest salary from the Employee table. For example, given the Employee table below, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+

SOLUTION A: Using IFNULL, OFFSET

  • IFNULL(expression, alt: ifnull() returns the specified value if null, otherwise returns the expected value. We’ll use this to return null if there’s no second-highest salary.
  • OFFSET : offset is used with the ORDER BY clause to disregard the top n rows that you specify. This will be useful as you’ll want to get the second row (2nd highest salary)
SELECT
IFNULL(
(SELECT DISTINCT Salary
FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1
), null) as SecondHighestSalary
FROM Employee
LIMIT 1

SOLUTION B: Using MAX()

This query says to choose the MAX salary that isn’t equal to the MAX salary, which is equivalent to saying to choose the second-highest salary!
SELECT MAX(salary) AS SecondHighestSalary
FROM Employee
WHERE salary != (SELECT MAX(salary) FROM Employee)

PROBLEM #2: Duplicate Emails

Write a SQL query to find all duplicate emails in a table named Person.
+----+---------+
| Id | Email |
+----+---------+
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
+----+---------+

SOLUTION A: COUNT() in a Subquery

First, a subquery is created to show the count of the frequency of each email. Then the subquery is filtered WHERE the count is greater than 1.
SELECT Email
FROM (
SELECT Email, count(Email) AS count
FROM Person
GROUP BY Email
) as email_count
WHERE count > 1

SOLUTION B: HAVING Clause

  • HAVING is a clause that essentially allows you to use a WHERE statement in conjunction with aggregates (GROUP BY).
SELECT Email
FROM Person
GROUP BY Email
HAVING count(Email) > 1

PROBLEM #3: Rising Temperature

Given a Weather table, write a SQL query to find all dates' Ids with higher temperature compared to its previous (yesterday's) dates.
+---------+------------------+------------------+
| Id(INT) | RecordDate(DATE) | Temperature(INT) |
+---------+------------------+------------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+---------+------------------+------------------+

SOLUTION: DATEDIFF()

  • DATEDIFF calculates the difference between two dates and is used to make sure we’re comparing today’s temperature to yesterday’s temperature.
In plain English, the query is saying, Select the Ids where the temperature on a given day is greater than the temperature yesterday.
SELECT DISTINCT a.Id
FROM Weather a, Weather b
WHERE a.Temperature > b.Temperature
AND DATEDIFF(a.Recorddate, b.Recorddate) = 1

PROBLEM #4: Department Highest Salary

The Employee table holds all employees. Every employee has an Id, a salary, and there is also a column for the department Id.
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
+----+-------+--------+--------------+
The Department table holds all departments of the company.
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, your SQL query should return the following rows (order of rows does not matter).
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
+------------+----------+--------+

SOLUTION: IN Clause

  • The IN clause allows you to use multiple OR clauses in a WHERE statement. For example WHERE country = ‘Canada’ or country = ‘USA’ is the same as WHERE country IN (‘Canada’, ’USA’).
  • In this case, we want to filter the Department table to only show the highest Salary per Department (i.e. DepartmentId). Then we can join the two tables WHERE the DepartmentId and Salary is in the filtered Department table.
SELECT
Department.name AS 'Department',
Employee.name AS 'Employee',
Salary
FROM Employee
INNER JOIN Department ON Employee.DepartmentId = Department.Id
WHERE (DepartmentId , Salary)
IN
( SELECT
DepartmentId, MAX(Salary)
FROM
Employee
GROUP BY DepartmentId
)

PROBLEM #5: Exchange Seats

Mary is a teacher in a middle school and she has a table seat storing students' names and their corresponding seat ids. The column id is a continuous increment. Mary wants to change seats for the adjacent students.
Can you write a SQL query to output the result for Mary?
+---------+---------+
| id | student |
+---------+---------+
| 1 | Abbot |
| 2 | Doris |
| 3 | Emerson |
| 4 | Green |
| 5 | Jeames |
+---------+---------+
For the sample input, the output is:
+---------+---------+
| id | student |
+---------+---------+
| 1 | Doris |
| 2 | Abbot |
| 3 | Green |
| 4 | Emerson |
| 5 | Jeames |
+---------+---------+
Note:

If the number of students is odd, there is no need to change the last one’s seat.

SOLUTION: CASE WHEN

  • Think of a CASE WHEN THEN statement like an IF statement in coding.
  • The first WHEN statement checks to see if there’s an odd number of rows, and if there is, ensure that the id number does not change.
  • The second WHEN statement adds 1 to each id (eg. 1,3,5 becomes 2,4,6)
  • Similarly, the third WHEN statement subtracts 1 to each id (2,4,6 becomes 1,3,5)
SELECT
CASE
WHEN((SELECT MAX(id) FROM seat)%2 = 1) AND id = (SELECT MAX(id) FROM seat) THEN id
WHEN id%2 = 1 THEN id + 1
ELSE id - 1
END AS id, student
FROM seat
ORDER BY id
And that’s it! Please comment below if there is anything unclear and I will do my best to clarify anything to the best of my ability — Thanks!

0 comments:

Post a Comment