Exercises - Derived Columns
3.5 Write a query to obtain the length of each customer’s first name (remember to look for string functions in the documentation that can help)
Table(s) to use: customer
Expected Output:
first_name | length |
---|---|
MARY | 4 |
PATRICIA | 8 |
LINDA | 5 |
BARBARA | 7 |
ELIZABETH | 9 |
JENNIFER | 8 |
MARIA | 5 |
SUSAN | 5 |
…
599 rows
3.6 Write a query to return the initials for each customer
Table(s) to use: customer
Expected Output:
first_name | last_name | initial |
---|---|---|
MARY | SMITH | MS |
PATRICIA | JOHNSON | PJ |
LINDA | WILLIAMS | LW |
BARBARA | JONES | BJ |
ELIZABETH | BROWN | EB |
JENNIFER | DAVIS | JD |
MARIA | MILLER | MM |
SUSAN | WILSON | SW |
…
599 rows
3.7 Each film has a rental_rate, which is how much money it costs for a customer to rent out the film. Each film also has a replacement_cost, which is how much money the film costs to replace. Write a query to figure out how many times each film must be rented out to cover its replacement cost.
Table(s) to use: film
Expected Output:
title | rental_rate | replacement_cost | # rentals to break-even |
---|---|---|---|
ACADEMY DINOSAUR | 0.99 | 20.99 | 22 |
ACE GOLDFINGER | 4.99 | 12.99 | 3 |
ADAPTATION HOLES | 2.99 | 18.99 | 7 |
AFFAIR PREJUDICE | 2.99 | 26.99 | 10 |
AFRICAN EGG | 2.99 | 22.99 | 8 |
AGENT TRUMAN | 2.99 | 17.99 | 7 |
AIRPLANE SIERRA | 4.99 | 28.99 | 6 |
AIRPORT POLLOCK | 4.99 | 15.99 | 4 |
…
1000 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.5 Write a query to obtain the length of each customer’s first name (remember to look for string functions in the documentation that can help)
Solution:
You can use the length(string) function to obtain the length of text.
select first_name, length(first_name)
from customer;
3.6 Write a query to return the initials for each customer
Solution:
You can use the left(text, n) function to obtain the first character from the first name and last name of each customer, and then join them with ||.
select
first_name,
last_name,
left(first_name, 1) || left(last_name, 1) as “initial”
from customer;
3.7 Each film has a rental_rate, which is how much money it costs for a customer to rent out the film. Each film also has a replacement_cost, which is how much money the film costs to replace. Write a query to figure out how many times each film must be rented out to cover its replacement cost.
Solution:
Divide replacement_cost by rental_rate to obtain the number of times each film must be rented out. This will however often be a fractional number and a film can’t be rented out a fractional number of times. To round this up to the nearest integer you can use the ceil(numeric) function. Click here for a full list of math functions.
select
title,
rental_rate,
replacement_cost,
ceil(replacement_cost / rental_rate) as “# rentals to break-even”
from film;
Exercises - Filtering with WHERE
3.8 Write a query to list all the films with a ‘G’ rating
Table(s) to use: film
Expected Output:
title | rating |
---|---|
ACE GOLDFINGER | G |
AFFAIR PREJUDICE | G |
AFRICAN EGG | G |
ALAMO VIDEOTAPE | G |
AMISTAD MIDSUMMER | G |
ANGELS LIFE | G |
ANNIE IDENTITY | G |
ARMAGEDDON LOST | G |
…
178 rows
3.9 List all the films longer than 2 hours (note each film has a length in minutes)
Table(s) to use: film
Expected Output:
title | length |
---|---|
AFRICAN EGG | 130 |
AGENT TRUMAN | 169 |
ALAMO VIDEOTAPE | 126 |
ALASKA PHANTOM | 136 |
ALI FOREVER | 150 |
ALLEY EVOLUTION | 180 |
AMERICAN CIRCUS | 129 |
ANALYZE HOOSIERS | 181 |
…
456 rows
3.10 Write a query to list all the rentals made before June, 2005
Table(s) to use: rental
Expected Output:
rental_id | rental_date |
---|
2|2005-05-24 22:54:33|
3|2005-05-24 23:03:39|
4|2005-05-24 23:04:41|
5|2005-05-24 23:05:21|
6|2005-05-24 23:08:07|
7|2005-05-24 23:11:53|
8|2005-05-24 23:31:46|
9|2005-05-25 00:00:40|
…
1156 rows
3.11 In Exercise 3.7, you wrote a query to figure out how many times each film must be rented out to cover its replacement cost. Now write a query to return only those films that must be rented out more than 30 times to cover their replacement cost.
Table(s) to use: film
Expected Output:
title | rental_rate | replacement_cost | # rentals to break-even |
---|---|---|---|
ARABIA DOGMA | 0.99 | 29.99 | 31 |
BALLROOM MOCKINGBIRD | 0.99 | 29.99 | 31 |
BONNIE HOLOCAUST | 0.99 | 29.99 | 31 |
CLOCKWORK PARADISE | 0.99 | 29.99 | 31 |
CLYDE THEORY | 0.99 | 29.99 | 31 |
CRUELTY UNFORGIVEN | 0.99 | 29.99 | 31 |
EARTH VISION | 0.99 | 29.99 | 31 |
EVERYONE CRAFT | 0.99 | 29.99 | 31 |
…
22 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.8 Write a query to list all the films with a ‘G’ rating
Solution:
select title, rating
from film
where rating = ‘G’;
3.9 List all the films longer than 2 hours (note each film has a length in minutes)
Solution:
Use a WHERE clause with length > 120 (120 minutes is equivalent to 2 hrs). Alternatively, you could have used where (length/60.0) > 2 but you would have to be careful with integer division (if you’re not familiar with integer division, don’t worry. We’ll cover it in an upcoming video).
select title, length
from film
where length > 120;
3.10 Write a query to list all the rentals made before June, 2005
Solution:
You can use the first day of June 2005 (‘2005-06-01’) as the date to compare against.
select rental_id, rental_date
from rental
where rental_date < ‘2005-06-01’;
3.11 In Exercise 3.7, you wrote a query to figure out how many times each film must be rented out to cover its replacement cost. Now write a query to return only those films that must be rented out more than 30 times to cover their replacement cost.
Solution:
We have to repeat the expression to calculate the number of times the film must be rented out in both the select and the where clause. In a later video, we’ll learn ways that we can avoid this repetition.
select
title,
rental_rate,
replacement_cost,
ceil(replacement_cost / rental_rate) as “# rentals to break-even”
from film
where ceil(replacement_cost / rental_rate) > 30;
Exercises - AND, OR, and NOT
3.12 Write a query to show all rentals made by the customer with ID 388 in 2005
Table(s) to use: rental
Expected Output:
rental_id | rental_date |
---|
21|2005-05-25 01:59:46|
411|2005-05-27 14:14:14|
1276|2005-06-15 08:00:13|
2145|2005-06-17 22:10:36|
2537|2005-06-19 01:52:21|
2692|2005-06-19 13:08:19|
3159|2005-06-20 21:11:50|
4947|2005-07-08 22:49:37|
…
29 rows
3.13 We’re trying to list all films with a length of an hour or less. Show two different ways to fix our query below that isn’t working (one using the NOT keyword, and one without)
select title, rental_duration, length
from film
where length not > 60;
Table(s) to use: film
Expected Output:
title | rental_duration | length |
---|---|---|
ACE GOLDFINGER | 3 | 48 |
ADAPTATION HOLES | 7 | 50 |
AIRPORT POLLOCK | 6 | 54 |
ALIEN CENTER | 5 | 46 |
ALTER VICTORY | 6 | 57 |
APOCALYPSE FLAMINGOS | 6 | 0 |
BENEATH RUSH | 6 | 53 |
BLUES INSTINCT | 5 | 50 |
…
105 rows
3.14 Explain what each of the two queries below are doing and why they generate different results. Which one is probably a mistake and why?
select title, rating
from film
where rating != ‘G’
and rating != ‘PG’;
select title, rating
from film
where rating != ‘G’
or rating != ‘PG’;
Table(s) to use: film
3.15 Write a single query to show all rentals where the return date is greater than the rental date, or the return date is equal to the rental date, or the return date is less than the rental date. How many rows are returned? Why doesn’t this match the number of rows in the table overall?
Table(s) to use: rental
Expected Output:
rental_id | rental_date | return_date |
---|
2|2005-05-24 22:54:33|2005-05-28 19:40:33|
3|2005-05-24 23:03:39|2005-06-01 22:12:39|
4|2005-05-24 23:04:41|2005-06-03 01:43:41|
5|2005-05-24 23:05:21|2005-06-02 04:33:21|
6|2005-05-24 23:08:07|2005-05-27 01:32:07|
7|2005-05-24 23:11:53|2005-05-29 20:34:53|
8|2005-05-24 23:31:46|2005-05-27 23:33:46|
9|2005-05-25 00:00:40|2005-05-28 00:22:40|
…
15861 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.12 Write a query to show all rentals made by the customer with ID 388 in 2005
Solution:
select rental_id, rental_date
from rental
where rental_date >= ‘2005-01-01’
and rental_date < ‘2006-01-01’
and customer_id = 388;
3.13 We’re trying to list all films with a length of an hour or less. Show two different ways to fix our query below that isn’t working (one using the NOT keyword, and one without)
Solution:
A solution using the NOT keyword (the original query had the NOT in the wrong place)
select title, rental_duration, length
from film
where not length > 60;
A solution not using the NOT keyword.
select title, rental_duration, length
from film
where length <= 60;
3.14 Explain what each of the two queries below are doing and why they generate different results. Which one is probably a mistake and why?
Solution:
The first query will return all films where the rating is neither ‘G’ nor ‘PG’. Stated another way, if a film is rated ‘G’ it won’t be in the output. If a film is rated ‘PG’ it also won’t be in the output. All other films will be in the output. The second query almost does nothing. It returns almost all the films - including those with a ‘G’ rating and also those with a ‘PG’ rating. (I say almost for reasons you’ll learn about in the next video). Think carefully about how both AND and OR work to understand this behaviour.
3.15 Write a single query to show all rentals where the return date is greater than the rental date, or the return date is equal to the rental date, or the return date is less than the rental date. How many rows are returned? Why doesn’t this match the number of rows in the table overall?
Solution:
In the next video, we’ll explore why this only returns 15,861 rows and not 16,044 - the total number of rows in the table.
select rental_id, rental_date, return_date
from rental
where return_date > rental_date
or return_date = rental_date
or return_date < rental_date;
Exercises - NULL handling
3.16 Write a query to list the rentals that haven’t been returned
Table(s) to use: rental
Expected Output:
rental_id | return_date |
---|
11496| |
11541| |
12101| |
11563| |
11577| |
11593| |
11611| |
11646| |
…
183 rows
3.17 Write a query to list the films that have a rating that is not ‘G’ or ‘PG’
Table(s) to use: film
Expected Output:
title | rating |
---|---|
ADAPTATION HOLES | NC-17 |
AIRPLANE SIERRA | PG-13 |
AIRPORT POLLOCK | R |
ALABAMA DEVIL | PG-13 |
ALADDIN CALENDAR | NC-17 |
DATE SPEED | R |
ALICE FANTASIA | NC-17 |
ALIEN CENTER | NC-17 |
…
629 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.16 Write a query to list the rentals that haven’t been returned
Solution:
A rental that hasn’t been returned has a NULL return_date.
select rental_id, return_date
from rental
where return_date is null;
3.17 Write a query to list the films that have a rating that is not ‘G’ or ‘PG’
Solution:
For a film to appear in the output, we want the rating to not be ‘PG’. AND we want the rating not to be ‘G’. But we will want to include it if the rating is NULL. Be sure to study the use of AND and OR here carefully.
select title, rating
from film
where rating != ‘PG’
and rating != ‘G’
or rating is null;
Exercises - IN, BETWEEN, and LIKE
3.18 Write a query to return the films with a rating of ‘PG’, ‘G’, or ‘PG-13’
Table(s) to use: film
Expected Output:
title | rating |
---|---|
ACADEMY DINOSAUR | PG |
ACE GOLDFINGER | G |
AFFAIR PREJUDICE | G |
AFRICAN EGG | G |
AGENT TRUMAN | PG |
AIRPLANE SIERRA | PG-13 |
ALABAMA DEVIL | PG-13 |
ALAMO VIDEOTAPE | G |
…
593 rows
3.19 Write a query equivalent to the one below using BETWEEN.
select title, length
from film
where length >= 90
and length <= 120;
Table(s) to use: film
Expected Output:
title | length |
---|---|
AFFAIR PREJUDICE | 117 |
ALABAMA DEVIL | 114 |
DATE SPEED | 104 |
ALICE FANTASIA | 94 |
AMADEUS HOLY | 113 |
ANACONDA CONFESSIONS | 92 |
ANTHEM LUKE | 91 |
APACHE DIVINE | 92 |
…
222 rows
3.20 Write a query to return all film titles that end with the the word “GRAFFITI”
Table(s) to use: film
Expected Output:
title |
---|
CLUB GRAFFITI |
HUMAN GRAFFITI |
STRANGERS GRAFFITI |
3 rows
3.21 In exercise 3.17 you wrote a query to list the films that have a rating that is not ‘G’ or ‘PG’. Re-write this query using NOT IN. Do your results include films with a NULL rating?
Table(s) to use: film
SCROLL DOWN FOR SOLUTIONS >>>
3.18 Write a query to return the films with a rating of ‘PG’, ‘G’, or ‘PG-13’
Solution:
We can use the IN keyword to simplify the query.
select title, rating
from film
where rating in (‘PG’, ‘G’, ‘PG-13’);
3.19 Write a query equivalent to the one below using BETWEEN.
Solution:
select title, length
from film
where length between 90 and 120;
3.20 Write a query to return all film titles that end with the the word “GRAFFITI”
Solution:
The key here is to use the % wildcard at the start of the pattern string, as any text can come before the word ‘GRAFFITI’. But not at the end of the pattern string.
select title
from film
where title like ‘%GRAFFITI’;
3.21 In exercise 3.17 you wrote a query to list the films that have a rating that is not ‘G’ or ‘PG’. Re-write this query using NOT IN. Do your results include films with a NULL rating?
Solution:
You can re-write the query as below. Your results will not include films with a NULL rating, so as in exercise 3.17 you should be careful when working with a column that could contain NULL values and put in explicit checks if you want them included in the output.
select title, rating
from film
Exercises - ORDER BY
3.22 Write a query the list all the customers with an email address. Order the customers by last name descending
Table(s) to use: customer
Expected Output:
first_name | last_name |
---|---|
CYNTHIA | YOUNG |
MARVIN | YEE |
LUIS | YANEZ |
BRIAN | WYMAN |
BRENDA | WRIGHT |
TYLER | WREN |
FLORENCE | WOODS |
LORI | WOOD |
…
597 rows
3.23 Write a query to list the country id’s and cities from the city table, first ordered by country id ascending, then by city alphabetically.
Table(s) to use: city
Expected Output:
country_id | city |
---|
1|Kabul |
2|Batna |
2|Bchar |
2|Skikda |
3|Tafuna |
4|Benguela |
4|Namibe |
5|South Hill |
6|Almirante Brown |
6|Avellaneda |
6|Baha Blanca |
6|Crdoba |
6|Escobar |
6|Ezeiza |
…
600 rows
3.24 Write a query to list actors ordered by the length of their full name ("[first_name] [last_name]") descending.
Table(s) to use: actor
Expected Output:
full_name | len |
---|---|
MICHELLE MCCONAUGHEY | 20 |
JOHNNY LOLLOBRIGIDA | 19 |
ANGELA WITHERSPOON | 18 |
MATTHEW JOHANSSON | 17 |
CAMERON ZELLWEGER | 17 |
JAYNE SILVERSTONE | 17 |
CHRISTOPHER BERRY | 17 |
FRANCES DAY-LEWIS | 17 |
…
200 rows
3.25 Describe the difference between ORDER BY x, y DESC and ORDER BY x DESC, y DESC (where x and y are columns in some imaginary table you’re querying)
Table(s) to use: [‘imaginary!’]
SCROLL DOWN FOR SOLUTIONS >>>
3.22 Write a query the list all the customers with an email address. Order the customers by last name descending
Solution:
We first want to only include customers with an email address (email is not null). Then we order by last_name descending.
select first_name, last_name
from customer
where email is not null
order by last_name desc;
3.23 Write a query to list the country id’s and cities from the city table, first ordered by country id ascending, then by city alphabetically.
Solution:
Because we are sorting both columns in ascending order, which is the default sort order, we can leave off adding ASC.
select country_id, city
from city
order by country_id, city;
3.24 Write a query to list actors ordered by the length of their full name ("[first_name] [last_name]") descending.
Solution:
You can create a computed column in your SELECT statement for the length of each actor’s full name and then refer to that alias in the ORDER BY clause. You may be wondering why you couldn’t just write length(full_name) for the second column in your select clause - ie. why did you have to repeat “first_name || ’ ’ || last_name”? This is because the second derived column is not guaranteed to be evaluated after the first. There is no guaranteed order of evalution of expressions in the SELECT clause! (and other clauses in general too)
select
first_name || ’ ’ || last_name as full_name,
length(first_name || ’ ’ || last_name) as len
from actor
order by len desc;
3.25 Describe the difference between ORDER BY x, y DESC and ORDER BY x DESC, y DESC (where x and y are columns in some imaginary table you’re querying)
Solution:
ORDER BY x, y DESC is equivalent to ORDER BY x ASC, y DESC - order first by x ascending, then by y descending. This is different from ORDER BY x DESC, y DESC - order by x descending, then by y descending. When asc/desc is omitted, ascending is the default.
Exercises - SQL Order of execution
3.26 Fix the query below, which we wanted to use to list all the rentals that happened after 10pm at night.
P.S. Don’t be intimated by the date_part function which you may not have seen so far. It’s just another function which you read about in the official documentation
select rental_id, date_part(‘hour’, rental_date) as “rental hour”
from rental
where “rental hour” >= 22;
Table(s) to use: rental
Expected Output:
rental_id | rental hour |
---|
2| 22|
3| 23|
4| 23|
5| 23|
6| 23|
7| 23|
8| 23|
136| 22|
…
1252 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.26 Fix the query below, which we wanted to use to list all the rentals that happened after 10pm at night.
Solution:
You can’t refer to the “rental hour” alias in the where clause, so instead just repeat the expression!
select rental_id, date_part(‘hour’, rental_date) as “rental hour”
from rental
where date_part(‘hour’, rental_date) >= 22;
Exercises - LIMIT and OFFSET
3.27 Write a query to return the 3 most recent payments received
Table(s) to use: payment
Expected Output:
payment_id | payment_date |
---|
31918|2007-05-14 13:44:29|
31917|2007-05-14 13:44:29|
31919|2007-05-14 13:44:29|
3 rows
3.28 Return the 4 films with the shortest length that are not R rated. For films with the same length, order them alphabetically
Table(s) to use: film
Expected Output:
title | length | rating |
---|---|---|
ALIEN CENTER | 46 | NC-17 |
IRON MOON | 46 | PG |
KWAI HOMEWARD | 46 | PG-13 |
LABYRINTH LEAGUE | 46 | PG-13 |
4 rows
3.29 Write a query to return the last 3 payments made in January, 2007
Table(s) to use: payment
Expected Output:
payment_id | amount | payment_date |
---|
16813| 0.99|2007-01-31 21:16:11|
16890| 0.99|2007-01-31 21:06:00|
17173| 2.99|2007-01-31 20:45:37|
3 rows
3.30 Can you think of a way you could, as in the previous exercise, return the last 3 payments made in January, 2007 but have those same 3 output rows ordered by date ascending? (Don’t spend too long on this…)
Table(s) to use: payment
Expected Output:
payment_id | amount | payment_date |
---|
17173| 2.99|2007-01-31 20:45:37|
16890| 0.99|2007-01-31 21:06:00|
16813| 0.99|2007-01-31 21:16:11|
3 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.27 Write a query to return the 3 most recent payments received
Solution:
You’ll first need to order the payments by payment date descending, which will ensure that the first rows are the most recent. Then use the limit keyword to return only the first 3.
select payment_id, payment_date
from payment
order by payment_date desc
limit 3;
3.28 Return the 4 films with the shortest length that are not R rated. For films with the same length, order them alphabetically
Solution:
This was a tricky one. When building the WHERE clause we need to make sure to include films that have a NULL rating (technically they are not ‘R’ rated!). Also, we need to order first by length then by title to obtain alphabetical order for films of the same length. Remember, LIMIT is the very last clause to be processed.
select title, length, rating
from film
where rating != ‘R’
or rating is null
order by length, title
limit 4;
3.29 Write a query to return the last 3 payments made in January, 2007
Solution:
You might have initially forgotten the second part of the WHERE, to make sure the payment date must be in January. You need to do this because there might be less than 3 payments made in January, and if there were, your query would actually return payments outside of January.
select payment_id, amount, payment_date
from payment
where payment_date < ‘2007-02-01’
and payment_date >= ‘2007-01-01’
order by payment_date desc
limit 3;
3.30 Can you think of a way you could, as in the previous exercise, return the last 3 payments made in January, 2007 but have those same 3 output rows ordered by date ascending? (Don’t spend too long on this…)
Solution:
You don’t yet have the tools to do this and it’s a bit of a problem with using ORDER BY/LIMIT while wanting separate output row ordering. I’ll show you how to handle situations like this in a future video.
Exercises - DISTINCT
3.31 Write a query to return all the unique ratings films can have, ordered alphabetically (not including NULL)
Table(s) to use: film
Expected Output:
rating |
---|
G |
PG |
PG-13 |
R |
NC-17 |
5 rows
3.32 Write a query to help us quickly see if there is any hour of the day that we have never rented a film out on (maybe the staff always head out for lunch?)
Table(s) to use: rental
3.33 Write a query to help quickly check whether the same rental rate is used for each rental duration (for example - is the rental rate always 4.99 when the rental duration is 3?)
Table(s) to use: film
Expected Output:
rental_duration | rental_rate |
---|
3| 4.99|
3| 2.99|
3| 0.99|
4| 2.99|
4| 4.99|
4| 0.99|
5| 2.99|
5| 4.99|
…
15 rows
3.34 Can you explain why the first query below works, but the second one, which simply adds the DISTINCT keyword, doesn’t? (this is quite challenging)
select first_name
from actor
order by last_name;
select distinct first_name
from actor
order by last_name;
Table(s) to use: actor
SCROLL DOWN FOR SOLUTIONS >>>
3.31 Write a query to return all the unique ratings films can have, ordered alphabetically (not including NULL)
Solution:
select distinct rating
from film
where rating is not null
order by rating;
3.32 Write a query to help us quickly see if there is any hour of the day that we have never rented a film out on (maybe the staff always head out for lunch?)
Solution:
You can use the date_part(‘hour’, rental_date) function to obtain the hour part for each rental date/time. Combined with DISTINCT and ORDER BY you obtain a simple list you can scroll through and see that no, there’s nothing to be alarmed about here…all hours of the day are accounted for.
select distinct date_part(‘hour’, rental_date) as hr
from rental
order by hr;
3.33 Write a query to help quickly check whether the same rental rate is used for each rental duration (for example - is the rental rate always 4.99 when the rental duration is 3?)
Solution:
This query will return all unique combinations of rental_duration and rental_rate and we can see clearly now that the answer is no - many different rental rates can be used for the same rental duration.
select distinct rental_duration, rental_rate
from film
order by rental_duration;
3.34 Can you explain why the first query below works, but the second one, which simply adds the DISTINCT keyword, doesn’t? (this is quite challenging)
Solution:
In the second query, multiple rows of actors are combined in to a single row due to the use of DISTINCT. For example, there are two actors with the first name ADAM (ADAM HOPPER and ADAM GRANT), however after the SELECT DISTINCT clause has been processed, there is only one row with first name ADAM. Ordering then by last name is undefined - eg. In the case of ADAM, Postgres has no way to know which last name should be used (HOPPER or GRANT?). In general, avoid ordering by columns you haven’t selected and you can sidestep complex situations like this.
Exercises - CASE expressions
3.35 Write a query to return an ordered list of distinct ratings for films in our films table along with their descriptions (you will have to type in the descriptions yourself)
Table(s) to use: film
Expected Output:
rating | rating description |
---|---|
G | General |
PG | Parental Guidance Recommended |
PG-13 | Parents Strongly Cautioned |
R | Restricted |
NC-17 | Adults Only |
5 rows
3.36 Write a query to output ‘Returned’ for returned rentals and ‘Not Returned’ for rentals that haven’t been returned. Order the output to show those not returned first.
Table(s) to use: rental
Expected Output:
rental_id | rental_date | return_date | return_status |
---|
14503|2006-02-14 15:16:03| |Not Returned |
14172|2006-02-14 15:16:03| |Not Returned |
12988|2006-02-14 15:16:03| |Not Returned |
14488|2006-02-14 15:16:03| |Not Returned |
15655|2006-02-14 15:16:03| |Not Returned |
14204|2006-02-14 15:16:03| |Not Returned |
12130|2006-02-14 15:16:03| |Not Returned |
12970|2006-02-14 15:16:03| |Not Returned |
…
16044 rows
3.37 Imagine you were asked to write a query to populate a ‘country picker’ for some internal company dashboard. Write a query to return the countries in alphabetical order, but also with the twist that the first 3 countries in the list must be 1) Australia 2) United Kingdom 3) United States and then normal alphabetical order after that (maybe you want them first because, for example, most of your customers come from these countries)
Table(s) to use: country
Expected Output:
country |
---|
Australia |
United Kingdom |
United States |
Afghanistan |
Algeria |
American Samoa |
Angola |
Anguilla |
…
109 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.35 Write a query to return an ordered list of distinct ratings for films in our films table along with their descriptions (you will have to type in the descriptions yourself)
Solution:
This exercise combines quite a few things we’ve learned so far. We use DISTINCT, IS NOT NULL, and ORDER BY to obtain a unique ordered list of all the different film ratings. Then we use the simple form of the case expression to convert each rating to a description.
select distinct rating,
case rating
when ‘G’ then ‘General’
when ‘PG’ then ‘Parental Guidance Recommended’
when ‘PG-13’ then ‘Parents Strongly Cautioned’
when ‘R’ then ‘Restricted’
when ‘NC-17’ then ‘Adults Only’
end as “rating description”
from film
where rating is not null
order by rating;
3.36 Write a query to output ‘Returned’ for returned rentals and ‘Not Returned’ for rentals that haven’t been returned. Order the output to show those not returned first.
Solution:
Here you can use the searched form of the case expression to check when the return date is NULL and output the appropriate status.
select rental_id, rental_date, return_date,
case
when return_date is null then ‘Not Returned’
else ‘Returned’
end as return_status
from rental
order by return_status;
3.37 Imagine you were asked to write a query to populate a ‘country picker’ for some internal company dashboard. Write a query to return the countries in alphabetical order, but also with the twist that the first 3 countries in the list must be 1) Australia 2) United Kingdom 3) United States and then normal alphabetical order after that (maybe you want them first because, for example, most of your customers come from these countries)
Solution:
This was a tricky one so study the solution carefully and make sure you understand it. You can use a case statement in the ORDER BY clause to return a number for particular countries, which is then used for ordering. So Australia becomes 0, United Kingdom becomes 1, United States becomes 2 and every other country gets the result 3. ORDER BY first orders by these numbers, in ascending order. There is then a second order by term (the “, country” bit after the case statement ends) which then ensures that for all those countries where the result was 3, they too are ordered alphabetically.
select country
from country
order by
case country
when ‘Australia’ then 0
when ‘United Kingdom’ then 1
when ‘United States’ then 2
else 3
end, country;
Challenge exercises
3.38 We want to give a prize to 5 random customers each month. Write a query that will return 5 random customers each time it is run (you may find a particular math function helpful - make sure to search the documentation!)
Table(s) to use: customer
Expected Output:
Your output should list a different set of customers each time you run the query!
first_name | last_name | |
---|---|---|
DUANE | TUBBS | DUANE.TUBBS@sakilacustomer.org |
OLGA | JIMENEZ | OLGA.JIMENEZ@sakilacustomer.org |
ROSEMARY | SCHMIDT | ROSEMARY.SCHMIDT@sakilacustomer.org |
PAMELA | BAKER | PAMELA.BAKER@sakilacustomer.org |
BILL | GAVIN | BILL.GAVIN@sakilacustomer.org |
5 rows
3.39 Give 3 different solutions to list the rentals made in June, 2005. In one solution, use the date_part function. In another, use the BETWEEN keyword. In the third, don’t use either date_part or BETWEEN.
Table(s) to use: rental
Expected Output:
rental_id | rental_date |
---|
1158|2005-06-14 22:53:33|
1159|2005-06-14 22:55:13|
1160|2005-06-14 23:00:34|
1161|2005-06-14 23:07:08|
1162|2005-06-14 23:09:38|
1163|2005-06-14 23:12:46|
1164|2005-06-14 23:16:26|
1165|2005-06-14 23:16:27|
…
2311 rows
3.40 Return the top 5 films for $ per minute (rental_rate / length) of entertainment
Table(s) to use: film
Expected Output:
title | rental_rate | length | per_minute |
---|---|---|---|
IRON MOON | 4.99 | 46 | 0.10847826086956521739 |
HANOVER GALAXY | 4.99 | 47 | 0.10617021276595744681 |
MIDSUMMER GROUNDHOG | 4.99 | 48 | 0.10395833333333333333 |
ACE GOLDFINGER | 4.99 | 48 | 0.10395833333333333333 |
PELICAN COMFORTS | 4.99 | 48 | 0.10395833333333333333 |
5 rows
3.41 Write a query to list all customers who have a first name containing the letter ‘A’ twice or more
Table(s) to use: customer
Expected Output:
first_name |
---|
PATRICIA |
BARBARA |
MARIA |
MARGARET |
SANDRA |
LAURA |
SARAH |
ANGELA |
…
56 rows
3.42 PostgreSQL supports an interesting variation of DISTINCT called DISTINCT ON. Visit the official documentation page and read about DISTINCT ON. See if you can figure out how you would use it in a query to return the most recent rental for each customer
Table(s) to use: rental
Expected Output:
customer_id | rental_date |
---|
1|2005-08-22 20:03:46|
2|2005-08-23 17:39:35|
3|2005-08-23 07:10:14|
4|2005-08-23 07:43:00|
5|2006-02-14 15:16:03|
6|2005-08-23 06:41:32|
7|2005-08-21 04:49:48|
8|2005-08-23 14:31:19|
9|2006-02-14 15:16:03|
10|2005-08-22 21:59:29|
…
599 rows
3.43 Write a query to list all the customers with an email address but not in the format [first_name].[last_name]@sakilacustomer.org
Table(s) to use: customer
Expected Output:
first_name | last_name | |
---|---|---|
JEAN | BELL | JEAN.BELL@sakilacustomer.or |
BONNIE | HUGHES | BONNIE.HUGHES@sabilacustomer.org |
2 rows
SCROLL DOWN FOR SOLUTIONS >>>
3.38 We want to give a prize to 5 random customers each month. Write a query that will return 5 random customers each time it is run (you may find a particular math function helpful - make sure to search the documentation!)
Solution:
The random() function, which generates a random value in the range 0.0 <= x < 1.0, is useful for this behaviour and can be used in the ORDER BY clause to achieve a random ordering of customers.
select first_name, last_name, email
from customer
order by random()
limit 5;
3.39 Give 3 different solutions to list the rentals made in June, 2005. In one solution, use the date_part function. In another, use the BETWEEN keyword. In the third, don’t use either date_part or BETWEEN.
Solution:
Using the date_part function.
select rental_id, rental_date
from rental
where date_part(‘year’, rental_date) = 2005
and date_part(‘month’, rental_date) = 6;
Using the BETWEEN keyword.
select rental_id, rental_date
from rental
where rental_date between
‘2005-06-01 00:00:00’ and ‘2005-06-30 24:00:00’;
Instead of BETWEEN you can also use the standard greater than and less than operators.
select rental_id, rental_date
from rental
where rental_date >= ‘2005-06-01 00:00:00’
and rental_date <= ‘2005-06-30 24:00:00’;
3.40 Return the top 5 films for $ per minute (rental_rate / length) of entertainment
Solution:
Make sure to first filter out films where the length is NULL or 0 (to avoid division by zero errors).
select title, rental_rate, length,
rental_rate / length as per_minute
from film
where length is not null
and length != 0
order by per_minute desc
limit 5;
3.41 Write a query to list all customers who have a first name containing the letter ‘A’ twice or more
Solution:
You can use the LIKE or ILIKE keyword - whichever you prefer (though LIKE is standard SQL while ILIKE is specific to PostgreSQL). And we make use of the % wildcard here to represent a string of any length (including the empty string)
select first_name
from customer
where first_name ilike ‘%a%a%’;
3.42 PostgreSQL supports an interesting variation of DISTINCT called DISTINCT ON. Visit the official documentation page and read about DISTINCT ON. See if you can figure out how you would use it in a query to return the most recent rental for each customer
Solution:
SELECT DISTINCT ON keeps only the first row of each set of rows where the given expressions evaluate to equal. For this query, where the output is ordered first by customer ID ascending and then by rental date descending, the DISTINCT ON clause results in us keeping the first encountered row for each unique customer ID - which is the most recent rental date. This is extremely useful!
select distinct on (customer_id) customer_id, rental_date
from rental
order by customer_id asc, rental_date desc;
3.43 Write a query to list all the customers with an email address but not in the format [first_name].[last_name]@sakilacustomer.org
Solution:
You may be wondering why I used parentheses around the full email string. In this case, it’s purely for readability and clarity. The query will work exactly the same without the parentheses but I add them so it’s clear that I’m checking the email address against the full concatenated string and not just part of it.
select first_name, last_name, email
from customer
where email != (first_name || ‘.’ || last_name || ‘@sakilacustomer.org’)
Exercises - Common aggregate functions
4.1 Write a query to return the total count of customers in the customer table and the count of how many customers provided an email address
Table(s) to use: customer
Expected Output:
# customers | # customers with email |
---|
599| 597|
1 row
4.2 Building on the previous exercise, now return an additional result showing the percentage of customers with an email address (as a helpful hint, if you’re getting 0 try multiplying the fraction by 100.0 - we’ll examine why this is necessary in an upcoming chapter on data types)
Table(s) to use: customer
Expected Output:
# customers | # customers with email | % with email |
---|
599| 597|99.6661101836393990|
1 row
4.3 Write a query to return the number of distinct customers who have made payments
Table(s) to use: payment
Expected Output:
count |
---|
599 |
1 row
4.4 What is the average length of time films are rented out for
Table(s) to use: rental
Expected Output:
avg rental duration |
---|
4 days 24:36:28.541706 |
1 row
4.5 Write a query to return the sum total of all payment amounts received
Table(s) to use: payment
Expected Output:
total |
---|
67416.51 |
1 row
SCROLL DOWN FOR SOLUTIONS >>>
4.1 Write a query to return the total count of customers in the customer table and the count of how many customers provided an email address
Solution:
SELECT COUNT(*) will return the total number of rows, while COUNT(expression) will return the number of rows where expression is not NULL.
select
count(*) as “# customers”,
count(email) as “# customers with email”
from customer;
4.2 Building on the previous exercise, now return an additional result showing the percentage of customers with an email address (as a helpful hint, if you’re getting 0 try multiplying the fraction by 100.0 - we’ll examine why this is necessary in an upcoming chapter on data types)
Solution:
You can obtain the percentage by dividing the two terms and multiplying by 100.0
select
count() as “# customers”,
count(email) as “# customers with email”,
100.0 * count(email) / count() as “% with email”
from customer;
4.3 Write a query to return the number of distinct customers who have made payments
Solution:
You can use DISTINCT inside the COUNT function to count the number of distinct customer IDs.
select count(distinct customer_id)
from payment;
4.4 What is the average length of time films are rented out for
Solution:
select avg(return_date - rental_date) as “avg rental duration”
from rental;
4.5 Write a query to return the sum total of all payment amounts received
Solution:
select sum(amount) as total
from payment;
Exercises - GROUP BY and HAVING
4.6 List the number of films each actor has appeared in and order the results from most popular to least
Table(s) to use: film_actor
Expected Output:
actor_id | num_films |
---|
107| 42|
102| 41|
198| 40|
181| 39|
23| 37|
81| 36|
158| 35|
60| 35|
…
200 rows
4.7 List the customers who have made over 40 rentals
Table(s) to use: rental
Expected Output:
customer_id |
---|
75|
144|
526|
469|
236|
197|
148|
7 rows
4.8 We want to compare how the staff are performing against each other on a month to month basis. So for each month and year, show for each staff member how many payments they handled, the total amount of money they accepted, and the average payment amount
Table(s) to use: payment
Expected Output:
year | month | staff_id | num_payments | payment_total | avg_payment_amount |
---|---|---|---|---|---|
2007 | 1 | 1 | 617 | 2621.83 | 4.2493192868719611 |
2007 | 1 | 2 | 540 | 2202.60 | 4.0788888888888889 |
2007 | 2 | 1 | 1164 | 4776.36 | 4.1034020618556701 |
2007 | 2 | 2 | 1148 | 4855.52 | 4.2295470383275261 |
2007 | 3 | 1 | 2817 | 11776.83 | 4.1806283280085197 |
2007 | 3 | 2 | 2827 | 12109.73 | 4.2835974531305271 |
2007 | 4 | 1 | 3364 | 14080.36 | 4.1856004756242568 |
2007 | 4 | 2 | 3390 | 14479.10 | 4.2711209439528024 |
2007 | 5 | 1 | 95 | 234.09 | 2.4641052631578947 |
2007 | 5 | 2 | 87 | 280.09 | 3.2194252873563218 |
10 rows
SCROLL DOWN FOR SOLUTIONS >>>
4.6 List the number of films each actor has appeared in and order the results from most popular to least
Solution:
Each row in the film_actor table identifies a single film an actor is in. So simply counting the number of rows grouped by actor gives us the count of films the actor has been in (we don’t have to worry about duplicates due to the table structure).
select actor_id, count(*) as num_films
from film_actor
group by actor_id
order by num_films desc;
4.7 List the customers who have made over 40 rentals
Solution:
You can use HAVING to filter the groups to only return those customers with more than 40 rentals.
select customer_id
from rental
group by customer_id
having count(*) >= 40;
4.8 We want to compare how the staff are performing against each other on a month to month basis. So for each month and year, show for each staff member how many payments they handled, the total amount of money they accepted, and the average payment amount
Solution:
This was a difficult one and probably the biggest query you’ve written so far. Give yourself a big pat on the back if you managed to work it out. Here we use the date_part function to allow us to extract the month and year as separate components from the payment date. And our group by clause includes 3 elements - the year, month, and staff id.
select
date_part(‘year’, payment_date) as year,
date_part(‘month’, payment_date) as month,
staff_id,
count(*) as num_payments,
sum(amount) as payment_total,
avg(amount) as avg_payment_amount
from payment
group by
date_part(‘year’, payment_date),
date_part(‘month’, payment_date),
staff_id
order by year, month, staff_id;
Exercises - CASE expressions and aggregations
4.9 Write a query to show the number of rentals that were returned within 3 days, the number returned in 3 or more days, and the number never returned (for the logical comparison check you can use the following code snippet to compare against an interval: where return_date - rental_date < interval ‘3 days’)
Table(s) to use: rental
Expected Output:
lt 3 days | gt 3 days | never returned |
---|
4388| 11473| 183|
1 row
4.10 Write a query to give counts of the films by their length in groups of 0 - 1hrs, 1 - 2hrs, 2 - 3hrs, and 3hrs+ (note: you might get slightly different numbers if doing inclusive or exclusive grouping - but don’t sweat it!)
Table(s) to use: film
Expected Output:
len | count |
---|---|
0-1hrs | 97 |
1-2hrs | 436 |
2-3hrs | 419 |
3hrs+ | 48 |
4 rows
SCROLL DOWN FOR SOLUTIONS >>>
4.9 Write a query to show the number of rentals that were returned within 3 days, the number returned in 3 or more days, and the number never returned (for the logical comparison check you can use the following code snippet to compare against an interval: where return_date - rental_date < interval ‘3 days’)
Solution:
For this solution we use the PostgreSQL specific filter keyword vs the standard case version, but both are equally valid approaches.
select
count() filter
(where return_date - rental_date < interval ‘3 days’) as “lt 3 days”,
count() filter
(where return_date - rental_date >= interval ‘3 days’) as “gt 3 days”,
count(*) filter
(where return_date is null) as “never returned”
from rental;
4.10 Write a query to give counts of the films by their length in groups of 0 - 1hrs, 1 - 2hrs, 2 - 3hrs, and 3hrs+ (note: you might get slightly different numbers if doing inclusive or exclusive grouping - but don’t sweat it!)
Solution:
Here we use the CASE expression in our GROUP BY clause and also take advantage of the convenient numerical positioning reference to avoid repeating the full CASE expression in both the GROUP BY and SELECT clause (notice we can even use that numerical positioning approach in our ORDER BY clause too!)
select
case
when length between 0 and 59 then ‘0-1hrs’
when length between 60 and 119 then ‘1-2hrs’
when length between 120 and 179 then ‘2-3hrs’
else ‘3hrs+’
end as len,
count(*)
from film
group by 1
order by 1;
Challenge exercises
4.11 Explain why in the following query we obtain two different results for the average film length. Which one is correct?
select
1.0 * sum(length) / count(*) as avg1,
1.0 * avg(length) as avg2
from film;
Table(s) to use: film
Expected Output:
avg1 | avg2 |
---|---|
114.8850000000000000 | 115.11523046092184370 |
4.12 Write a query to return the average rental duration for each customer in descending order
Table(s) to use: rental
Expected Output:
customer_id | avg_rent_duration |
---|
315| 6 days 14:13:22.5|
187|5 days 34:58:38.571428|
321|5 days 32:56:32.727273|
539|5 days 31:39:57.272727|
436| 5 days 31:09:46|
532|5 days 30:59:34.838709|
427| 5 days 29:27:05|
555|5 days 26:48:35.294118|
…
599 rows
4.13 Return a list of customer where all payments they’ve made have been over $2 (lookup the bool_and aggregate function which will be useful here)
Table(s) to use: payment
Expected Output:
customer_id |
---|
363|
59|
2 rows
4.14 As a final fun finish to this chapter, run the following query to see a cool way you can generate ascii histogram charts. Look up the repeat function (you’ll find it under ‘String Functions and Operators’) to see how it works and change the output character…and don’t worry, I’ll explain the ::int bit in the next chapter!
select rating, repeat(’’, (count() / 10)::int) as “count/10”
from film
where rating is not null
group by rating;
Table(s) to use: film
Expected Output:
rating | count/10 |
---|---|
PG-13 | ********************** |
R | ******************* |
G | ***************** |
PG | ******************* |
NC-17 | ******************** |
5 rows
SCROLL DOWN FOR SOLUTIONS >>>
4.11 Explain why in the following query we obtain two different results for the average film length. Which one is correct?
Solution:
When using the avg aggregate function, NULL lengths are ignored in the calculation. However, the count(*) term in the first selected column will count all the rows, even those with a NULL length. Technically either approach could be correct, depending on how you want to treat NULL length films. Most people would probably argue though that here, the second approach is correct.
4.12 Write a query to return the average rental duration for each customer in descending order
Solution:
select
customer_id,
avg(return_date - rental_date) as avg_rent_duration
from rental
group by customer_id
order by avg_rent_duration desc;
4.13 Return a list of customer where all payments they’ve made have been over $2 (lookup the bool_and aggregate function which will be useful here)
Solution:
This was a tricky one. The key here was to realize bool_and is just like the other aggregate functions you’ve already seen in that it combines multiple rows in to a single output, but unlike all the other aggregate functions you’ve seen bool_and does this by performing a logical AND between each input expression.
select customer_id
from payment
group by customer_id
having bool_and(amount > 2);
4.14 As a final fun finish to this chapter, run the following query to see a cool way you can generate ascii histogram charts. Look up the repeat function (you’ll find it under ‘String Functions and Operators’) to see how it works and change the output character…and don’t worry, I’ll explain the ::int bit in the next chapter!
Solution:
The repeat function is used to repeat the ‘*’ character count/10 times for each group (I divided by 10 just to keep the output concise!)
Exercises - Character data types
5.1 Write a query to print a description of each film’s length as shown in the output below. When a film does not have a length, print: [title] is unknown length
Table(s) to use: film
Expected Output:
length_desc |
---|
ACADEMY DINOSAUR is 86 minutes |
ACE GOLDFINGER is 48 minutes |
ADAPTATION HOLES is 50 minutes |
AFFAIR PREJUDICE is 117 minutes |
AFRICAN EGG is 130 minutes |
AGENT TRUMAN is 169 minutes |
AIRPLANE SIERRA is 62 minutes |
AIRPORT POLLOCK is 54 minutes |
…
1000 rows
5.2 You want to play a movie title guessing game with some friends. Write a query to print only the first 3 letters of each film title and then ‘*’ for the rest (The repeat function may come in handy here…)
Table(s) to use: film
Expected Output:
Guess! |
---|
ACA************* |
ACE*********** |
ADA************* |
AFF************* |
AFR******** |
AGE********* |
AIR************ |
AIR************ |
…
1000 rows
SCROLL DOWN FOR SOLUTIONS >>>
5.1 Write a query to print a description of each film’s length as shown in the output below. When a film does not have a length, print: [title] is unknown length
Solution:
There are a couple of different ways you could accomplish this. The coalesce function is a popular way to substitute in a default value when some expression is NULL (in this case, if the length is NULL, the entire expression length || ’ minutes’ will evaluate to NULL).
select
title || ’ is ’ || coalesce(length || ’ minutes’, ‘unknown length’) as length_desc
from film;
5.2 You want to play a movie title guessing game with some friends. Write a query to print only the first 3 letters of each film title and then ‘*’ for the rest (The repeat function may come in handy here…)
Solution:
You can first use the left function to obtain the first 3 characters of the film title, and then concatenate that with ‘*’ repeated (length - 3) times to fill out the remaining characters in the title.
select left(title, 3) || repeat(’*’, length(title) - 3) as “Guess!”
from film;
Exercises - Number data types
5.3 Write a query to list the percentage of films that are rated NC-17, G, PG, PG-13, NC-17, and R, rounded to the nearest integer.
Table(s) to use: film
Expected Output:
% NC-17 | % PG | % G | % R | % PG-13 |
---|
21| 19| 18| 20| 22|
1 row
5.4 Try a few of the different explicit casting operations listed below to get familiar with how casting behaves. Was the behaviour what you expected?
select int ‘33’;
select int ‘33.3’;
select cast(33.3 as int);
select cast(33.8 as int);
select 33::text;
select ‘hello’::varchar(2);
select cast(35000 as smallint);
select 12.1::numeric(1,1);
SCROLL DOWN FOR SOLUTIONS >>>
5.3 Write a query to list the percentage of films that are rated NC-17, G, PG, PG-13, NC-17, and R, rounded to the nearest integer.
Solution:
To calculate the percentage, you can use either the technique of using a CASE expression inside a SUM/COUNT aggregation or alternatively the COUNT(*) FILTER variation available in Postgres. The ratio is multiplied by 100.0 to ensure the calculation is done as a numeric and we avoid integer division. Finally, the round function is used to round each percentage to the nearest integer.
select
round(100.0 * count() filter(where rating = ‘NC-17’) / count()) as “% NC-17”,
round(100.0 * count() filter(where rating = ‘PG’) / count()) as “% PG”,
round(100.0 * count() filter(where rating = ‘G’) / count()) as “% G”,
round(100.0 * count() filter(where rating = ‘R’) / count()) as “% R”,
round(100.0 * count() filter(where rating = ‘PG-13’) / count()) as “% PG-13”
from film;
5.4 Try a few of the different explicit casting operations listed below to get familiar with how casting behaves. Was the behaviour what you expected?
Solution:
There’s a few interesting things to point out here. First, note that the second query fails - when you’re using the decorated literal form, the literal must exactly match the target type. For the third and fourth queries when 33.3 and 33.8 are cast to int, both succeed and round up or down to the nearest integer. Casting a number to text works just fine as does casting some literal text to a varchar, but the result is truncated to as many characters as the varchar can fit. The final two queries fail - 35,000 does not fit within a smallint and likewise 12.1 does not fit within a numeric(1,1).
Exercises - Date and time data types
5.5 Show 3 different ways to input the timestamptz representing 4th March, 2019 at 3:30pm in New York, USA
Run the following queries to see the full list of timezones available in Postgres.
select *
from pg_timezone_names;
select *
from pg_timezone_abbrevs;
5.6 The rental duration in the film table is currently stored as an integer, representing the number of days. Write a query to return this as an interval instead and then add one day to the duration
Table(s) to use: film
Expected Output:
title | duration | duration + 1 |
---|---|---|
ACADEMY DINOSAUR | 6 days | 7 days |
ACE GOLDFINGER | 3 days | 4 days |
ADAPTATION HOLES | 7 days | 8 days |
AFFAIR PREJUDICE | 5 days | 6 days |
AFRICAN EGG | 6 days | 7 days |
AGENT TRUMAN | 3 days | 4 days |
AIRPLANE SIERRA | 6 days | 7 days |
AIRPORT POLLOCK | 6 days | 7 days |
…
1000 rows
5.7 You have a theory that certain hours of the day might be busiest for rentals. To investigate this write a query to list out for all time the the number of rentals made during each hour of the day
Table(s) to use: rental
Expected Output:
hr | count |
---|---|
0 | 694 |
1 | 649 |
2 | 630 |
3 | 684 |
4 | 681 |
5 | 648 |
6 | 647 |
7 | 667 |
…
24 rows
5.8 If you wanted to aggregate payments received by year and month you could write a query as below using the date_part function. Try and simplify this query by instead using the date_trunc function to achieve effectively the same result (ignoring the slight difference in date presentation)
select
date_part(‘year’, payment_date) as “year”,
date_part(‘month’, payment_date) as “month”,
sum(amount) as total
from payment
group by “year”, “month”
order by “year”, “month”;
Table(s) to use: payment
Expected Output:
month | total |
---|---|
2007-01-01 00:00:00 | 4824.43 |
2007-02-01 00:00:00 | 9631.88 |
2007-03-01 00:00:00 | 23886.56 |
2007-04-01 00:00:00 | 28559.46 |
2007-05-01 00:00:00 | 514.18 |
5 rows
SCROLL DOWN FOR SOLUTIONS >>>
5.5 Show 3 different ways to input the timestamptz representing 4th March, 2019 at 3:30pm in New York, USA
Solution:
You can specify the timezone in several different ways as shown below. Additionally, you can specify the time part in either 24-hour time or in 12-hour time with AM/PM.
select
timestamptz ‘2019-03-04 15:30 EST’,
timestamptz ‘2019-03-04 03:30PM -5’,
timestamptz ‘2019-03-04 03:30PM America/New_York’;
5.6 The rental duration in the film table is currently stored as an integer, representing the number of days. Write a query to return this as an interval instead and then add one day to the duration
Solution:
We need to construct the interval from an integer and we can do this by concatenating the number of days with ’ days’ and then casting that text expression as an interval.
select
title,
cast(rental_duration || ’ days’ as interval) as duration,
cast(rental_duration || ’ days’ as interval) + interval ‘1 day’ as “duration + 1”
from film;
5.7 You have a theory that certain hours of the day might be busiest for rentals. To investigate this write a query to list out for all time the the number of rentals made during each hour of the day
Solution:
You can use the date_part function to obtain the hour component for each rental date and then group by and order by that expression.
select
date_part(‘hour’, rental_date) as hr,
count(*)
from rental
group by hr
order by hr;
5.8 If you wanted to aggregate payments received by year and month you could write a query as below using the date_part function. Try and simplify this query by instead using the date_trunc function to achieve effectively the same result (ignoring the slight difference in date presentation)
Solution:
You can use the date_trunc function to truncate each payment date to the month component and then group by and order by that expression. Your output date format will be slightly different vs the initial query but in many cases this date format which represents a full date and time is even preferable for being plugged in to external data visualization tools (like Excel, etc.).
select
date_trunc(‘month’, payment_date) as “month”,
sum(amount) as total
from payment
group by “month”
order by “month”;
Challenge exercises
5.9 Write a query to return a count of the number of films that were rented out on the last day of a month
Table(s) to use: rental
Expected Output:
total # EOM rentals |
---|
842|
5.10 Write a query to list the film titles that have spaces at the beginning or end (we want to flag them so we know to clean them up!)
Table(s) to use: film
Expected Output:
0 rows
5.11 Write a query to sum up, for each customer, the total number of hours they have had films rented out for. Return only the top 3 customers with the most hours.
Table(s) to use: rental
Expected Output:
customer_id | hrs_rented |
---|
526| 6340|
148| 5834|
144| 5641|
3 rows
5.12 Postgres has a really useful function called generate_series which will come in handy in a few of the coming chapters. Have a look at the examples how to use it here and then write a query to generate a list of timestamps which represent the first day of every month in 2019, at 5pm UTC
Expected Output:
generate_series |
---|
2019-01-02 03:00:00 |
2019-02-02 03:00:00 |
2019-03-02 03:00:00 |
2019-04-02 03:00:00 |
2019-05-02 03:00:00 |
2019-06-02 03:00:00 |
2019-07-02 03:00:00 |
2019-08-02 03:00:00 |
2019-09-02 03:00:00 |
2019-10-02 03:00:00 |
2019-11-02 03:00:00 |
2019-12-02 03:00:00 |
12 rows
5.13 Return a count of the number of occurrences of the letter ‘A’ in each customer’s first name (this is a common interview question for SQL related jobs!). Order the output by the count descending.
Table(s) to use: customer
Expected Output:
first_name | count |
---|---|
CASSANDRA | 3 |
BARBARA | 3 |
SAMANTHA | 3 |
TAMARA | 3 |
AMANDA | 3 |
MARSHALL | 2 |
MARSHA | 2 |
MARIAN | 2 |
…
599 rows
5.14 Write a query to tally up the total amount of money made on weekends (Saturday and Sunday)
Table(s) to use: payment
Expected Output:
total $ |
---|
19036.04 |
1 row
SCROLL DOWN FOR SOLUTIONS >>>
5.9 Write a query to return a count of the number of films that were rented out on the last day of a month
Solution:
There are a couple of ways you could go about solving this one. You could use the date_part function to extract the year and month and check for known pairs (31 days in January, etc) but you would have to be careful to account for leap years too which is tricky to handle! The solution below by constrast is quite elegant - we can truncate the rental date to the month and then make use of interval math to obtain the last day of the month and then compare that to the truncated rental date (to remove the time component).
select count(*) as “total # EOM rentals”
from rental
where date_trunc(‘month’, rental_date) + interval ‘1 month’ - interval ‘1 day’
= date_trunc(‘day’, rental_date);
5.10 Write a query to list the film titles that have spaces at the beginning or end (we want to flag them so we know to clean them up!)
Solution:
There aren’t any! But it’s always good to check just in case. One way to approach this is to use the trim function, which by default will remove both leading and trailing whitespace and then compare the length of that to the original title length. If some whitespace was removed, the length difference will be greater than 0.
select title
from film
where length(title) - length(trim(title)) > 0;
5.11 Write a query to sum up, for each customer, the total number of hours they have had films rented out for. Return only the top 3 customers with the most hours.
Solution:
select
customer_id,
round(sum(date_part(‘epoch’, return_date - rental_date)) / 3600) as hrs_rented
from rental
group by customer_id
order by hrs_rented desc
limit 3;
5.12 Postgres has a really useful function called generate_series which will come in handy in a few of the coming chapters. Have a look at the examples how to use it here and then write a query to generate a list of timestamps which represent the first day of every month in 2019, at 5pm UTC
Solution:
generate_series takes 3 arguments - the start, stop, and step. Our start timestamp is set to the 1st Jan 2019 at 5pm UTC, our end timestamp is set to 1st Dec 2019 at 5pm UTC, and the step is set to 1 month. Generated tables like this are very useful to join on to for analytics and reporting type queries. Note that your output will look different to mine depending on your local timezone - remember that any internal timestamps with time zone information are converted to your timezone for display purposes.
select *
from generate_series(
timestamptz ‘2019-01-01 17:00 UTC’,
timestamptz ‘2019-12-01 17:00 UTC’,
interval ‘1 month’);
5.13 Return a count of the number of occurrences of the letter ‘A’ in each customer’s first name (this is a common interview question for SQL related jobs!). Order the output by the count descending.
Solution:
Similar to the exercise involving removing whitespace, in this case you can use the replace function to remove any instances of the letter ‘A’ (by replacing ‘A’ with a blank space - nothing). Subtracting the length of this expression from the length of the unedited first name then yields you the number of occurences of ‘A’.
select
first_name,
length(first_name) - length(replace(first_name, ‘A’, ‘’)) as count
from customer
order by count desc;
5.14 Write a query to tally up the total amount of money made on weekends (Saturday and Sunday)
Solution:
You can use either the ‘dow’ or ‘isodow’ options with the date_part function to obtain a numerical representation of the day of the week. When using ‘isodow’, 6 represents Saturday and 7 represents Sunday. Handy!
select sum(amount) as “total $”
from payment
where date_part(‘isodow’, payment_date) in (6, 7);
Exercises - Inner joins
6.1 Write a query to return a list of all the films rented by PETER MENARD showing the most recent first
Table(s) to use: rental, customer, inventory, film
Expected Output:
rental_date | title |
---|---|
2005-08-23 18:43:31 | OCTOBER SUBMARINE |
2005-08-22 09:10:21 | PRIDE ALAMO |
2005-08-21 08:40:56 | FUGITIVE MAGUIRE |
2005-08-20 12:53:46 | BRIGHT ENCOUNTERS |
2005-08-19 10:06:53 | REDEMPTION COMFORTS |
2005-08-19 09:42:01 | SPOILERS HELLFIGHTERS |
2005-08-17 07:56:22 | GO PURPLE |
2005-08-02 15:44:55 | GOODFELLAS SALUTE |
…
23 rows
6.2 Write a query to list the full names and contact details for the manager of each store
Table(s) to use: store, staff
Expected Output:
store_id | Manager |
---|
1|Mike Hillyer|Mike.Hillyer@sakilastaff.com|
2|Jon Stephens|Jon.Stephens@sakilastaff.com|
2 rows
6.3 Write a query to return the top 3 most rented out films and how many times they’ve been rented out
Table(s) to use: rental, inventory, film
Expected Output:
film_id | title | count |
---|
103|BUCKET BROTHERHOOD| 34|
738|ROCKETEER MOTHER | 33|
331|FORWARD TEMPLE | 32|
3 rows
6.4 Write a query to show for each customer how many different (unique) films they’ve rented and how many different (unique) actors they’ve seen in films they’ve rented
Table(s) to use: rental, inventory, film, film_actor
Expected Output:
customer_id | num_films | num_actors |
---|
1| 30| 108|
2| 27| 118|
3| 26| 103|
4| 22| 89|
5| 38| 119|
6| 26| 110|
7| 32| 117|
8| 23| 109|
…
599 rows
6.5 Re-write the query below written in the older style of inner joins (which you still encounter surprisingly often online) using the more modern style. Re-write it once using ON to establish the join condition and the second time with USING.
select film.title, language.name as “language”
from film, language
where film.language_id = language.language_id;
Table(s) to use: film, language
Expected Output:
title | language |
---|---|
ACADEMY DINOSAUR | English |
ACE GOLDFINGER | English |
ADAPTATION HOLES | English |
AFFAIR PREJUDICE | English |
AFRICAN EGG | English |
AGENT TRUMAN | English |
AIRPLANE SIERRA | English |
AIRPORT POLLOCK | English |
…
1000 rows
SCROLL DOWN FOR SOLUTIONS >>>
6.1 Write a query to return a list of all the films rented by PETER MENARD showing the most recent first
Solution:
This exercises involves quite a few joins but once that’s setup it’s just a matter of filtering for the right customer and ordering by the rental date.
select r.rental_date, f.title
from rental as r
inner join customer as c
on r.customer_id = c.customer_id
inner join inventory as i
on r.inventory_id = i.inventory_id
inner join film as f
on i.film_id = f.film_id
where c.first_name = ‘PETER’
and c.last_name = ‘MENARD’
order by r.rental_date desc;
6.2 Write a query to list the full names and contact details for the manager of each store
Solution:
One thing to watch out for here is the key columns are named differently in the respective tables. In the store table, the manager’s staff ID is stored in the column manager_staff_id which references the staff_id column in the staff table.
select
store.store_id,
staff.first_name || ’ ’ || staff.last_name as “Manager”,
staff.email
from store
inner join staff
on store.manager_staff_id = staff.staff_id;
6.3 Write a query to return the top 3 most rented out films and how many times they’ve been rented out
Solution:
After setting up the joins necessary to resolve the inventory IDs in the rental table in to films you then need to setup a group by aggregation. You could just group by the film title but there’s a risk with this approach where if two films have the same title you would end up counting them as one. For safety, group by the film ID and title (and since the film ID is always unique, this essentially just creates a single group for each film but also allows you to use the title in the SELECT clause).
select f.film_id, f.title, count()
from rental as r
inner join inventory as i
on r.inventory_id = i.inventory_id
inner join film as f
on i.film_id = f.film_id
group by f.film_id, f.title
order by count() desc
limit 3;
6.4 Write a query to show for each customer how many different (unique) films they’ve rented and how many different (unique) actors they’ve seen in films they’ve rented
Solution:
After setting up all the required table joins, the key with this query is to realize you must perform a count of the distinct film IDs and actor IDs vs counting all the rows.
select
r.customer_id,
count(distinct f.film_id) as num_films,
count(distinct fa.actor_id) as num_actors
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
inner join film_actor as fa using (film_id)
group by r.customer_id
order by r.customer_id;
6.5 Re-write the query below written in the older style of inner joins (which you still encounter surprisingly often online) using the more modern style. Re-write it once using ON to establish the join condition and the second time with USING.
Solution:
The USING syntax is quite nice and convenient when you know the columns you’re joining on for equality are named the same in different tables.
select film.title, language.name as “language”
from film
inner join language
on film.language_id = language.language_id;
select film.title, language.name as “language”
from film
inner join language using (language_id);
Exercises - Outer joins
6.6 Write a query to list the films that are not in stock in any of the stores
Table(s) to use: film, inventory
Expected Output:
title |
---|
SKY MIRACLE |
KILL BROTHERHOOD |
SISTER FREDDY |
GLADIATOR WESTWARD |
FLOATS GARDEN |
APOLLO TEEN |
CRYSTAL BREAKING |
HOCUS FRIDA |
…
42 rows
6.7 Write a query to return a count of how many of each film we have in our inventory (include all films). Order the output showing the lowest in-stock first so we know to buy more!
Table(s) to use: film, inventory
Expected Output:
title | count |
---|---|
CHOCOLATE DUCK | 0 |
FRANKENSTEIN STRANGER | 0 |
APOLLO TEEN | 0 |
HOCUS FRIDA | 0 |
WAKE JAWS | 0 |
SUICIDES SILENCE | 0 |
GUMP DATE | 0 |
BUTCH PANTHER | 0 |
…
1000 rows
SCROLL DOWN FOR SOLUTIONS >>>
6.6 Write a query to list the films that are not in stock in any of the stores
Solution:
After performing a left outer join with the film and inventory tables, the only rows in the output with a NULL store ID will be those that didn’t have any matching records in both tables - that is, those films not in our inventory. So we can simply filter the output for where the store ID is NULL.
select f.title
from film as f
left outer join inventory as i
on f.film_id = i.film_id
where i.store_id is null;
6.7 Write a query to return a count of how many of each film we have in our inventory (include all films). Order the output showing the lowest in-stock first so we know to buy more!
Solution:
After joining the film and inventory tables we can then group by the film ID and perform a count of the number of rows, making sure that we’re only counting, for each group, the rows where the inventory ID is not NULL (to ensure missing films added by the outer join still end up with a count of 0, not 1)
select f.title, count(i.inventory_id)
from film as f
left outer join inventory as i
on f.film_id = i.film_id
group by f.film_id, f.title
order by count(i.inventory_id) asc;
Exercises - Advanced join topics
6.8 Write a query to return a count of the number of films rented by every customer on the 24th May, 2005. Order the results by number of films rented descending.
Table(s) to use: customer, rental
Expected Output:
customer_id | num_rented |
---|
1| 1|
2| 1|
3| 1|
4| 1|
5| 1|
6| 1|
7| 1|
8| 1|
9| 0|
10| 0|
11| 0|
12| 0|
13| 0|
…
599 rows
SCROLL DOWN FOR SOLUTIONS >>>
6.8 Write a query to return a count of the number of films rented by every customer on the 24th May, 2005. Order the results by number of films rented descending.
Solution:
A key issue you may have run across when solving this exercise is if you tried to perform the date check in the WHERE clause instead of making it one of the join conditions. The problem with making it part of the WHERE clause is it would act to filter the output of the left join, removing any rows that the left join added back in. This is because rows added back in by the left join - customers with no rentals - will have a NULL rental date. If you put the date check in the WHERE clause, you would effectively be making your left join an inner join!
select
c.customer_id,
count(r.rental_id) as num_rented
from customer as c
left join rental as r
on c.customer_id = r.rental_id
and date_trunc(‘day’, r.rental_date) = ‘20050524’
group by c.customer_id
order by num_rented desc;
Challenge exercises
6.9 Write a query to return how many copies of each film are available in each store, including zero counts if there are none. Order by count so we can easily see first which films need to be restocked in each store
Table(s) to use: film, store, inventory
Expected Output:
film_id | store_id | stock |
---|
2| 1| 0|
3| 1| 0|
5| 1| 0|
8| 1| 0|
13| 1| 0|
14| 1| 0|
14| 2| 0|
20| 2| 0|
…
2000 rows
6.10 Have a look at the documentation for the extremely useful Postgres function generate_series and then using generate_series write a query to return a count of the number of rentals for each and every month in 2005 (don’t worry too much about the output date formatting).
Table(s) to use: rental
Expected Output:
t | count |
---|---|
2005-01-01 00:00:00 | 0 |
2005-02-01 00:00:00 | 0 |
2005-03-01 00:00:00 | 0 |
2005-04-01 00:00:00 | 0 |
2005-05-01 00:00:00 | 1156 |
2005-06-01 00:00:00 | 2311 |
2005-07-01 00:00:00 | 6709 |
2005-08-01 00:00:00 | 5686 |
2005-09-01 00:00:00 | 0 |
2005-10-01 00:00:00 | 0 |
2005-11-01 00:00:00 | 0 |
2005-12-01 00:00:00 | 0 |
12 rows
6.11 Write a query to list the customers who rented out the film with ID 97 and then at some later date rented out the film with ID 841
Table(s) to use: rental, inventory
Expected Output:
customer_id |
---|
459|
25|
2 rows
SCROLL DOWN FOR SOLUTIONS >>>
6.9 Write a query to return how many copies of each film are available in each store, including zero counts if there are none. Order by count so we can easily see first which films need to be restocked in each store
Solution:
To obtain a complete list of every film and store combination, you first need to perform a cross join between the film and store tables. Only then can you left join the inventory table on to that - making sure you perform a composite join to match both the film ID and store ID.
select f.film_id, s.store_id, count(i.inventory_id) as stock
from film as f
cross join store as s
left join inventory as i
on f.film_id = i.film_id
and s.store_id = i.store_id
group by f.film_id, s.store_id
order by stock, f.film_id, s.store_id;
6.10 Have a look at the documentation for the extremely useful Postgres function generate_series and then using generate_series write a query to return a count of the number of rentals for each and every month in 2005 (don’t worry too much about the output date formatting).
Solution:
Here we use generate_series to return a table consisting of 12 rows, with each row containing a timestamp representing the first day of each month in 2005. The table has the alias m, and a single column t (you’re free to name your table and column whatever you want). On to this we left join the rental table, truncating the rental date to the month which allows us to perform an equality join (a timestamp truncated to the month will have it’s day part set to 01, and time part set to 00:00:00 - consistent with the output from generate_series). The syntax here might have thrown you off, but the underlying idea is the same as what you’ve done now in quite a few past exercises!
select
m.t,
count(r.rental_id)
from generate_series(‘2005-01-01’::timestamp, ‘2005-12-01’::timestamp, ‘1 month’) as m(t)
left join rental as r
on date_trunc(‘month’, r.rental_date) = m.t
group by m.t;
6.11 Write a query to list the customers who rented out the film with ID 97 and then at some later date rented out the film with ID 841
Solution:
This was a hard one - give yourself a big pat on the back if figured it out! This query involved a self-join on the rental table with the additional complexity that for each rental self-join, the inventory table was necessary as well to resolve the actual films being rented in each case!
select r.customer_id
from rental as r
inner join inventory as i
on r.inventory_id = i.inventory_id
inner join rental as r2
on r.customer_id = r2.customer_id
and r2.rental_date > r.rental_date
inner join inventory as i2
on r2.inventory_id = i2.inventory_id
where i.film_id = 97 and i2.film_id = 841;
Exercises - Uncorrelated subqueries
7.1 Write a query to return all the customers who made a rental on the first day of rentals (without hardcoding the date for the first day of rentals in your query)
Table(s) to use: rental, customer
Expected Output:
first_name | last_name |
---|---|
ANDREW | PURDY |
CASSANDRA | WALTERS |
CHARLOTTE | HUNTER |
DELORES | HANSEN |
MANUEL | MURRELL |
MINNIE | ROMERO |
NELSON | CHRISTENSON |
TOMMY | COLLAZO |
8 rows
7.2 Using a subquery, return the films that don’t have any actors. Now write the same query using a left join. Which solution do you think is better? Easier to read?
Table(s) to use: film, film_actor
Expected Output:
film_id | title |
---|
257|DRUMLINE CYCLONE|
323|FLIGHT LIES |
803|SLACKER LIAISONS|
3 rows
7.3 You intend to write a humorous email to congratulate some customers on their poor taste in films. To that end, write a query to return the customers who rented out the least popular film (that is, the film least rented out - if there is more than one, pick the one with the lowest film ID)
Table(s) to use: rental, inventory, customer
Expected Output:
customer_id | first_name | last_name |
---|
257|MARSHA |DOUGLAS |
142|APRIL |BURNS |
564|BOB |PFEIFFER |
89|JULIA |FLORES |
4 rows
SCROLL DOWN FOR SOLUTIONS >>>
7.1 Write a query to return all the customers who made a rental on the first day of rentals (without hardcoding the date for the first day of rentals in your query)
Solution:
For this exercise we use a subquery to obtain the date when rentals were first made. We then cast this timestamp to a date to allow ease of comparison - an alternative approach which will give you the same result would be to use the date_trunc function to truncate the rental date to the ‘day’. Distinct is used to eliminate duplicates just in case any one customer rented more than once on the opening day.
select distinct c.first_name, c.last_name
from rental as r
inner join customer as c using (customer_id)
where r.rental_date::date =
(select min(rental_date)::date
from rental);
7.2 Using a subquery, return the films that don’t have any actors. Now write the same query using a left join. Which solution do you think is better? Easier to read?
Solution:
The two solutions are listed below - I would personally prefer the subquery solution because I find that easier to think about but it’s up to you!
select film_id, title
from film
where film_id not in
(select film_id
from film_actor);
select film_id, title
from film as f
left join film_actor as fa using (film_id)
where fa.film_id is null;
7.3 You intend to write a humorous email to congratulate some customers on their poor taste in films. To that end, write a query to return the customers who rented out the least popular film (that is, the film least rented out - if there is more than one, pick the one with the lowest film ID)
Solution:
The subquery here is used to obtain the ID for the least popular film. This is achieved by grouping the rentals by film ID, ordering by count ascending, and picking the first. In this instance there is more than one “least popular” film so the film ID is used for tiebreaking in the order by clause.
select c.customer_id, c.first_name, c.last_name
from rental as r1
inner join inventory as i1 using (inventory_id)
inner join customer as c using (customer_id)
where i1.film_id =
(select i2.film_id
from rental as r2
inner join inventory as i2 using (inventory_id)
group by i2.film_id
order by count(*) asc, i2.film_id asc
limit 1);
Exercises - Correlated subqueries
7.4 Write a query to return the countries in our database that have more than 15 cities
Table(s) to use: country, city
Expected Output:
country |
---|
Brazil |
China |
India |
Japan |
Mexico |
Philippines |
Russian Federation |
United States |
8 rows
7.5 Write a query to return for each customer the store they most commonly rent from
Table(s) to use: customer, rental, inventory
Expected Output:
customer_id | first_name | last_name | Favourite Store |
---|
1|MARY |SMITH | 1|
2|PATRICIA |JOHNSON | 1|
3|LINDA |WILLIAMS | 1|
4|BARBARA |JONES | 2|
5|ELIZABETH |BROWN | 2|
6|JENNIFER |DAVIS | 1|
7|MARIA |MILLER | 1|
8|SUSAN |WILSON | 2|
…
599 rows
7.6 In the customer table, each customer has a store ID which is the store they originally registered at. Write a query to list for each customer whether they have ever rented from a different store than that one they registered at. Return ‘Y’ if they have, and ‘N’ if they haven’t.
Table(s) to use: rental, inventory, customer
Expected Output:
first_name | last_name | HasRentedOtherStore |
---|---|---|
MARY | SMITH | Y |
PATRICIA | JOHNSON | Y |
LINDA | WILLIAMS | Y |
BARBARA | JONES | Y |
ELIZABETH | BROWN | Y |
JENNIFER | DAVIS | Y |
MARIA | MILLER | Y |
SUSAN | WILSON | Y |
…
599 rows
SCROLL DOWN FOR SOLUTIONS >>>
7.4 Write a query to return the countries in our database that have more than 15 cities
Solution:
For this exercise we use a subquery to obtain, for each country, a count of the number of cities listed for that country. This can then be used in the WHERE clause to filter the countries in our final output.
select country.country
from country
where
(select count(*)
from city
where city.country_id = country.country_id) > 15;
7.5 Write a query to return for each customer the store they most commonly rent from
Solution:
The correlated subquery in this case is used to obtain, for each customer, the store that the customer visits the most. How do we work that out? By tallying up all the rentals from that customer by store ID, ordering by the count (descending! If you forget to add ‘desc’ you’ll return the least popular store for each customer!) and then taking the first result with LIMIT.
select
c.customer_id,
c.first_name,
c.last_name,
(select i.store_id
from rental as r
inner join inventory as i using (inventory_id)
where r.customer_id = c.customer_id
group by i.store_id
order by count(*) desc
limit 1) as “Favourite Store”
from customer as c;
7.6 In the customer table, each customer has a store ID which is the store they originally registered at. Write a query to list for each customer whether they have ever rented from a different store than that one they registered at. Return ‘Y’ if they have, and ‘N’ if they haven’t.
Solution:
This was a tricky one due to the combination of using a case expression and EXISTS. If you got it right, well done! The subquery here is used to return for each customer any rentals they made from a store different from the store they registered at. If any exist, then the case expression is used to output ‘Y’, otherwise ‘N’.
select c.first_name, c.last_name,
case
when exists
(select *
from rental as r
inner join inventory as i using (inventory_id)
where r.customer_id = c.customer_id
and i.store_id != c.store_id) then ‘Y’
else ‘N’
end as “HasRentedOtherStore”
from customer as c;
Exercises - Table subqueries
7.7 Write a query to return each customer 4 times
Table(s) to use: customer
Expected Output:
first_name | last_name |
---|---|
MARY | SMITH |
MARY | SMITH |
MARY | SMITH |
MARY | SMITH |
PATRICIA | JOHNSON |
PATRICIA | JOHNSON |
PATRICIA | JOHNSON |
PATRICIA | JOHNSON |
LINDA | WILLIAMS |
LINDA | WILLIAMS |
…
2396 rows
7.8 Write a query to return how many rentals the business gets on average on each day of the week. Order the results to show the days of the week with the highest average number of rentals first (use the round function to round the average so it’s a nice whole number). Have a look at the to_char function to obtain the day name given a timestamp.
Table(s) to use: rental
Expected Output:
day_name | average |
---|---|
Sunday | 464 |
Saturday | 462 |
Friday | 454 |
Monday | 449 |
Wednesday | 446 |
Thursday | 440 |
Tuesday | 224 |
7 rows
SCROLL DOWN FOR SOLUTIONS >>>
7.7 Write a query to return each customer 4 times
Solution:
Cross joining the customer table on to any table containing only four rows will do the job - to guarantee that we have exactly four rows we specify a values list and create a new virtual table to join on to.
select c.first_name, c.last_name
from customer as c
cross join (values (1), (2), (3), (4)) as v(n)
order by c.customer_id;
7.8 Write a query to return how many rentals the business gets on average on each day of the week. Order the results to show the days of the week with the highest average number of rentals first (use the round function to round the average so it’s a nice whole number). Have a look at the to_char function to obtain the day name given a timestamp.
Solution:
This was a tricky one and there’s a couple of different ways you could have solved it. In the solution below, the table subquery aggregates the ratings for each date from the rental table. The outer query then further aggregates by the actual day of the week for each date and performs an average for each day of the week (the to_char function is used to obtain the day name).
select
to_char(rent_day, ‘Day’) as day_name,
round(avg(num_rentals)) as average
from
(select date_trunc(‘day’, rental_date) as rent_day, count(*) as num_rentals
from rental
group by rent_day) as T
group by day_name
order by average desc;
Exercises - Lateral subqueries
7.9 Write a query to return for each customer the first ‘PG’ film that they rented (include customers who have never rented a ‘PG’ film as well)
Table(s) to use: customer, rental, inventory, film
Expected Output:
first_name | last_name | title | rental_date |
---|---|---|---|
MARY | SMITH | TALENTED HOMICIDE | 2005-05-28 10:35:23 |
PATRICIA | JOHNSON | BLACKOUT PRIVATE | 2005-06-17 20:54:58 |
LINDA | WILLIAMS | HORN WORKING | 2005-08-18 14:49:55 |
BARBARA | JONES | BEDAZZLED MARRIED | 2005-06-16 08:08:40 |
ELIZABETH | BROWN | TOOTSIE PILOT | 2005-05-29 07:25:16 |
JENNIFER | DAVIS | MONSTER SPARTACUS | 2005-07-11 12:39:01 |
MARIA | MILLER | LEGEND JEDI | 2005-06-16 21:06:00 |
SUSAN | WILSON | NORTHWEST POLISH | 2005-05-30 03:43:54 |
…
599 rows
SCROLL DOWN FOR SOLUTIONS >>>
7.9 Write a query to return for each customer the first ‘PG’ film that they rented (include customers who have never rented a ‘PG’ film as well)
Solution:
We make use of a left join lateral here to join on to a table subquery that will return for each customer the first film they rented that was PG. By using a left join, any customers who have not rented any PG films will be in the output with a NULL title and rental_date. There are 6 such customers.
select c.first_name, c.last_name, d.title, d.rental_date
from customer as c
left join lateral
(select r.customer_id, f.title, r.rental_date
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
where r.customer_id = c.customer_id
and f.rating = ‘PG’
order by r.rental_date
limit 1) as d
on c.customer_id = d.customer_id;
Exercises - Common table expressions
7.10 Write a query to list the customers who rented out the film with title “BRIDE INTRIGUE” and then at some later date rented out the film with title “STAR OPERATION”. Use a CTE to simplify your code if possible.
Table(s) to use: rental, inventory, film
Expected Output:
customer_id |
---|
459|
25|
2 rows
7.11 Write a query to calculate the amount of income received each month and compare that against the previous month’s income, showing the change.
Table(s) to use: payment
Expected Output:
month | income | prev month income | change |
---|---|---|---|
2007-01-01 00:00:00 | 4824.43 | ||
2007-02-01 00:00:00 | 9631.88 | 4824.43 | 4807.45 |
2007-03-01 00:00:00 | 23886.56 | 9631.88 | 14254.68 |
2007-04-01 00:00:00 | 28559.46 | 23886.56 | 4672.90 |
2007-05-01 00:00:00 | 514.18 | 28559.46 | -28045.28 |
5 rows
SCROLL DOWN FOR SOLUTIONS >>>
7.10 Write a query to list the customers who rented out the film with title “BRIDE INTRIGUE” and then at some later date rented out the film with title “STAR OPERATION”. Use a CTE to simplify your code if possible.
Solution:
In case this exercise feels familiar, it is! You earlier did a version of this exercise when learning about joins but were instead referencing the same films by their IDs (97 and 841 respectively). The solution involves self-joins and here the use of a CTE simplifies things substantially by allowing us to define a table called rental_detail that performs all the joins and then joining that table on to itself as part of the actual query. The join condition specifies that we only keep rows where the customer IDs match, the film titles match what we’re after, and the rental date of the first film is before the second film.
with rental_detail as
(
select r.customer_id, r.rental_date, f.title
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
)
select r1.customer_id
from rental_detail as r1
inner join rental_detail as r2
on r1.customer_id = r2.customer_id
and r2.rental_date > r1.rental_date
and r1.title = ‘BRIDE INTRIGUE’ and r2.title = ‘STAR OPERATION’;
7.11 Write a query to calculate the amount of income received each month and compare that against the previous month’s income, showing the change.
Solution:
A tough one - give yourself a pat on the back if you got this right. The CTE in this case is used to establish a baseline table holding the amount of money received in each month. By then performing a left self-join and establishing just the right join conditions (probably the trickiest part) you’re able to join up every month with the previous month to calculate the change in income.
with monthly_amounts as
(
select
date_trunc(‘month’, payment_date) as month,
sum(amount) as total
from payment
group by month
)
select
curr.month,
curr.total as “income”,
prev.total as “prev month income”,
curr.total - prev.total as “change”
from monthly_amounts as curr
left join monthly_amounts as prev
on curr.month = prev.month + interval ‘1 month’