Must-Know SQL Queries Every Analyst must know
By Andrew Tumwesigye
Data and Business Intelligence Analyst, Dimension Metrics
2nd Feb 2025
Introduction
Structured Query Language (SQL) was invented by IBM in the early 1970s and has since become one of the most powerful tools for querying and analyzing relational databases[1]. These databases store data in structured tables linked via primary and foreign keys, allowing analysts to extract insights efficiently.
My First Experience with SQL
In 2022, I enrolled in the Google Data Analytics Professional Certificate, which covered among others, Excel, SQL, R, and Tableau. I was familiar with Excel but the rest was new territory. However, as the course progressed I started to gravitate towards SQL considering it was easier than R. Initially, it seemed logical and straightforward. However, as I progressed, I realized that it wasn’t a walk in the park either. For example, a single missing comma or parenthesis could result in frustrating syntax errors, making debugging a nightmare.
Determined not to give up, I focused on mastering the fundamentals before tackling complex queries. This practice took two years, during which I completed weekly projects, ensuring that I fully grasped every challenge before moving forward.
The Good, the Bad, and the Ugly of my SQL journey
The Good
The basic SQL statements—SELECT, FROM, and WHERE—felt intuitive. SELECT determines the columns to retrieve, FROM specifies the table, and WHERE filters the results. For example, if I had a table called Sales, I could retrieve revenue above $500 as follows:
SELECT Revenue
FROM Sales
WHERE Revenue > 500;
The Bad
Syntax errors were my biggest nightmare. At one point, every query I wrote had an error, and debugging was exhausting. At times, I questioned whether I should just drop SQL and continue with Excel. However, up until this point, I had invested so much time, effort and money. So, I decided to push on and practice as frequently as I could and this became the magic bullet that helped me push through.
The Ugly
Advancing from moderate to advanced queries was the toughest challenge. Now I had mastered the moderate stage. The syntax errors and the query structures were no longer an issue. In other words, what was once complicated, now seemed easy. Little did I know, this was the proverbial "calm before the storm." Good Lord, the complexity of query writing escalated quickly, becoming not only longer but more demanding in logic. Yet, I pressed on, determined by how far I had come and the commitment I made to myself, knowing the end was in sight. Joins and nested queries were particularly challenging, but I overcame them through relentless practice.
Essential SQL Queries Every Analyst Must Know
The query suggestions below are based on experience gained from completing about 40 projects using SQL I’ve categorized them into 3 levels: Beginner, Intermediate and Advanced. While SQL is vast, the queries below represent, in my view, what every analyst should master.
Beginner-Level Queries
At a minimum, every analyst should understand:
SELECT FROM WHERE – Extracts data based on conditions.
GROUP BY – Groups data when using aggregation functions like SUM, COUNT, and AVG.
ORDER BY – Sorts query results in ascending or descending order.
HAVING – Filters aggregated data.
LIMIT – Restricts the number of rows displayed.
Example: Summing revenue for each store, sorting by the highest and limiting them to 10 stores (assuming there are very many stores)
SELECT Store, SUM(Revenue) AS Total_Revenue
FROM Sales
WHERE Store= ‘Central_region’
GROUP BY Store
ORDER BY Total_Revenue DESC
LIMIT 10;
Moderate-Level Queries
1. Joins
Joins allow analysts to combine data from multiple tables. Initially, I struggled with them and even tried memorizing them, which backfired. When I finally understood the logic behind LEFT JOIN and RIGHT JOIN and INNER JOIN, everything clicked.
Example: Retrieving students who attended and completed a training:
SELECT A.Student_Name, B.Completion_Status
FROM Attended_Training A
LEFT JOIN Completed_Training B
ON A.Student_ID = B.Student_ID;
A LEFT JOIN ensures all students who attended the training are included, even if they didn’t complete it. A RIGHT JOIN does the opposite, prioritizing completed training records. The INNER JOIN on the other hand prioritizes only those who attended and completed the training. It’s worth noting that joins are used when you have more than one table. For example, the query above includes Table 1 - employees who attended a training and Table 2 - employees who completed the training.
2. CASE Statements
CASE statements apply conditional logic within queries. For example, if your categorizing survey respondents based on mood and the initial data was captured as numerical code e.g., 0 for happy and 1 for sad:
SELECT Country, Mood,
CASE WHEN Mood = 1 THEN 'Sad'
ELSE 'Happy' END AS Mood_Category
FROM Happiness_Survey;
3. Calculating Percentages
This technique is invaluable in analytics. Often times an Analyst will be required to calculate the rate or percentage of a specific subset of data relative to the whole dataset.
Example: Calculating the percentage of respondents who reported feeling sad:
SELECT Country,
COUNT(*) AS Total_Respondents,
COUNT(CASE WHEN Mood = 1 THEN 1 END) AS Sad_Count,
(COUNT(CASE WHEN Mood = 1 THEN 1 END) * 100.0 / COUNT(*)) AS Sad_Percentage
FROM Happiness_Survey
GROUP BY Country;
Advanced-Level Queries
1. Common Table Expressions (CTEs)
CTEs simplify complex queries by breaking them into temporary tables.
Example: Finding the top two and bottom two days by call volume:
WITH Top_Calls AS (
SELECT Day, SUM(Call_Minutes) AS Total_Minutes
FROM Call_Center
GROUP BY Day
ORDER BY Total_Minutes DESC
LIMIT 2
),
Bottom_Calls AS (
SELECT Day, SUM(Call_Minutes) AS Total_Minutes
FROM Call_Center
GROUP BY Day
ORDER BY Total_Minutes ASC
LIMIT 2
)
SELECT * FROM Top_Calls
UNION
SELECT * FROM Bottom_Calls;
CTEs make queries more readable and manageable when working with multiple calculations.
2. Nested Queries
Nested queries allow one query to use the result of another. They can be applied at the SELECT FROM WHERE levels as indicated below;
Example 1: Finding students who scored above the average mark:
SELECT Student_Name, Marks
FROM Exam_Results
WHERE Marks > (SELECT AVG(Marks) FROM Exam_Results);
A nested query placed at the WHERE clause filters results from the inner query that first calculated the average exam results
Example 2: Using a nested query in the FROM clause to calculate subject averages:
SELECT Subject, Average_Marks
FROM (
SELECT Subject, AVG(Marks) AS Average_Marks
FROM Exam_Results
GROUP BY Subject
) AS Temp_Table
WHERE Average_Marks > 50;
A nested query placed at the FROM clause causes the inner query to become a derived table and provides a result set to be used by the outer query
Example 3: Using a nested query in the SELECT clause to compare student marks with the overall average:
SELECT Student_Name, Marks,
(SELECT AVG(Marks) FROM Exam_Results) AS General_Average
FROM Exam_Results;
Since the SELECT column specifies columns, applying a nested query at this position creates temporary column in the inner query that is used by the outer query.
Conclusion
These queries represent the fundamental SQL skills every analyst should master. From basic SELECT statements to advanced CTEs and nested queries, SQL is a powerful tool that enables data-driven decision-making. With persistence, practice, and patience, mastering SQL is possible for anyone willing to invest the time.
Having completed over 60 projects, I can confidently say that these queries form the backbone of real-world data analysis. Keep practicing, keep learning, and SQL will become second nature!
[1] https://en.wikipedia.org/wiki/SQL