[masterywithsql] A hands-on course, built for the busy learner - part 2

Andrew.Kaluba · 07.Июнь.2019 20:34:44

Challenge exercises

7.12 Write a query to return the customers who rented a film in 2005 but none in 2006

Table(s) to use: rental

Expected Output:

customer_id

…

441 rows

7.13 What are the top 3 countries the customers are from. Show both the number of customers from each country and percentage (round the percentage to the nearest whole number)

Table(s) to use: customer, address, city, country

Expected Output:

country	num_customers	percent
India	60	10
China	53	9
United States	36	6

3 rows

7.14 Write a query to perform a running total of payments received, grouping by month (ie. for each month return the amount of money received that month and also the total amount of money received up to (and including) that month - this is a useful view to have if you wanted to produce a cumulative chart). Hint - Re-use the monthly_amounts CTE from exercise 7.11

Table(s) to use: payment

Expected Output:

month	amount	cumamount
2007-01-01 00:00:00	4824.43	4824.43
2007-02-01 00:00:00	9631.88	14456.31
2007-03-01 00:00:00	23886.56	38342.87
2007-04-01 00:00:00	28559.46	66902.33
2007-05-01 00:00:00	514.18	67416.51

5 rows

7.15 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs. The generate_series function may come in handy

Table(s) to use: rental

Expected Output:

id
321
2247
6579
9426
15592

5 rows

7.16 In an earlier exercise I asked you to see if you could find a way to return the last 3 payments made in Jan, 2007 but ordered ascending. You’ve got the tools now to accomplish this - see if you can figure it out!

Table(s) to use: payment

Expected Output:

payment_id	amount	payment_date

 17173|  2.99|2007-01-31 20:45:37|
 16890|  0.99|2007-01-31 21:06:00|
 16813|  0.99|2007-01-31 21:16:11|

3 rows

7.12 Write a query to return the customers who rented a film in 2005 but none in 2006

Solution:

The solution below queries the rental table for all customers who did rent a film in 2005 and then further filters the list by removing customers who then rented a film in 2006.

select distinct customer_id
from rental
where date_part(‘year’, rental_date) = 2005
and customer_id not in
(select customer_id
from rental
where date_part(‘year’, rental_date) = 2006);

7.13 What are the top 3 countries the customers are from. Show both the number of customers from each country and percentage (round the percentage to the nearest whole number)

Solution:

This is a fairly standard exercise involving grouping and joins which at this point you’re very familiar with. The one addition though is the requirement to calculate a percentage for each country - to work out what percentage of customers come from each country. To do this, you’ll need to use a subquery to obtain the count of the total number of customers to use in the calculation. In the solution I have also rounded the percentage so it’s a nice whole number.

select
country,
count() as num_customers,
round(100.0 * count() / (select count() from customer)) as percent
from customer as c
inner join address using (address_id)
inner join city using (city_id)
inner join country using (country_id)
group by country
order by count() desc
limit 3;

7.14 Write a query to perform a running total of payments received, grouping by month (ie. for each month return the amount of money received that month and also the total amount of money received up to (and including) that month - this is a useful view to have if you wanted to produce a cumulative chart). Hint - Re-use the monthly_amounts CTE from exercise 7.11

Solution:

This exercise becomes a lot easier once we leverage the monthly_amounts CTE we worked on earlier. To obtain the cumulative amount for each month then, you can use a correlated subquery to calculate the sum of all the amounts from the same table up to and including the current month)

with monthly_amounts as
(
select
date_trunc(‘month’, payment_date) as month,
sum(amount) as amount
from payment
group by month
)
select ma1.month, ma1.amount,
(select sum(ma2.amount)
from monthly_amounts as ma2
where ma2.month <= ma1.month) as cumamount
from monthly_amounts as ma1
order by ma1.month;

7.15 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs. The generate_series function may come in handy

Solution:

This was a particularly tricky one! generate_series is used to obtain the full list of IDs from the min to the max and then via a correlated subquery, we only keep those that don’t exist in the rental table.

select s.id
from generate_series(
(select min(rental_id) from rental),
(select max(rental_id) from rental)) as s(id)
where not exists
(select *
from rental as r
where r.rental_id = s.id);

7.16 In an earlier exercise I asked you to see if you could find a way to return the last 3 payments made in Jan, 2007 but ordered ascending. You’ve got the tools now to accomplish this - see if you can figure it out!

Solution:

The key insight to accomplish this is to realize you can find and select the 3 rows you’re after using a table subquery. Within that subquery you’re able to order the payments in descending order to be able to pick the last 3, but then outside of that subquery you’re once again free to specify a new ordering for display purposes!

select payment_id, amount, payment_date
from
(select payment_id, amount, payment_date
from payment
where payment_date >= ‘2007-01-01’
and payment_date < ‘2007-02-01’
order by payment_date desc
limit 3) as p
order by payment_date asc;

Exercises - Ranking window functions

8.1 Write a query to return the 3 most recent rentals for each customer. Earlier you did this with a lateral join - this time do it with window functions

Table(s) to use: rental

Expected Output:

rental_id	customer_id	rental_date

15315|          1|2005-08-22 20:03:46|
15298|          1|2005-08-22 19:41:37|
14825|          1|2005-08-22 01:27:57|
15907|          2|2005-08-23 17:39:35|
15145|          2|2005-08-22 13:53:04|
14743|          2|2005-08-21 22:41:56|
15619|          3|2005-08-23 07:10:14|
15038|          3|2005-08-22 09:37:27|
14699|          3|2005-08-21 20:50:48|
15635|          4|2005-08-23 07:43:00|
15147|          4|2005-08-22 13:58:23|
14225|          4|2005-08-21 04:53:37|

…

1797 rows

8.2 We want to re-do exercise 7.3, where we wrote a query to return the customers who rented out the least popular film (that is, the film least rented out). This time though we want to be able to handle if there is more than one film that is least popular. So if several films are each equally unpopular, return the customers who rented out any of those films.

Table(s) to use: rental, inventory

Expected Output:

customer_id

12 rows

8.3 Write a query to return all the distinct film ratings without using the DISTINCT keyword

Table(s) to use: film

Expected Output:

rating
G
PG
PG-13
R
NC-17

5 rows

8.1 Write a query to return the 3 most recent rentals for each customer. Earlier you did this with a lateral join - this time do it with window functions

Solution:

For each rental we can calculate a row number based on a window of rentals partitioned by customer ID and ordered by rental date descending. By then picking those rentals with a row number less than or equal to 3 we obtain the 3 most recent rentals for each customer.

select rental_id, customer_id, rental_date
from
(select
rental_id,
customer_id,
rental_date,
row_number() over (partition by customer_id order by rental_date desc) as rn
from rental) as t
where rn <= 3;

8.2 We want to re-do exercise 7.3, where we wrote a query to return the customers who rented out the least popular film (that is, the film least rented out). This time though we want to be able to handle if there is more than one film that is least popular. So if several films are each equally unpopular, return the customers who rented out any of those films.

Solution:

The rent_counts CTE returns for each film the number of times it has been rented out and a corresponding ranking. If you run this on its own, you’ll find that there are 3 equally unpopular films, each only rented out 4 times. We then return those distinct customers from the rental table who rented any film that has a ranking of 1.

with rent_counts as
(
select
film_id,
count(),
rank() over (order by count())
from rental
inner join inventory using (inventory_id)
group by film_id
)
select distinct customer_id
from rental as r
inner join inventory as i using (inventory_id)
where i.film_id in
(select film_id
from rent_counts
where rank = 1);

8.3 Write a query to return all the distinct film ratings without using the DISTINCT keyword

Solution:

The key insight for this exercise is to recognize that window functions only operate over one window at a time. So by partitioning the film table table by rating and using the row_number() function, each new rating encountered will belong to a new window. And the first such film for each rating will therefore get a row number of 1. Filtering for only those films and removing NULL yields us a distinct list of ratings.

select rating
from
(select
rating,
row_number() over (partition by rating) as rn
from film) as t
where rn = 1
and rating is not null;

Exercises - Aggregate window functions

8.4 Write a query to show for each rental both the rental duration and also the average rental duration from the same customer

Table(s) to use: rental

Expected Output:

customer_id	rental_id	rent_duration	avg

      1|    10437|9 days 03:21:00|     4 days 11:18:07.5|
      1|    13068| 1 day 04:49:00|     4 days 11:18:07.5|
      1|    15315|7 days 05:48:00|     4 days 11:18:07.5|
      1|     2308|3 days 18:55:00|     4 days 11:18:07.5|
      1|     1185|8 days 01:48:00|     4 days 11:18:07.5|
      1|     7273|3 days 19:19:00|     4 days 11:18:07.5|
      1|       76|9 days 00:30:00|     4 days 11:18:07.5|
      1|     7841|2 days 03:33:00|     4 days 11:18:07.5|

…

16044 rows

8.5 Write a query to calculate a running total of payments received, grouped by month (ie. for each month show the total amount of money received that month and also the total amount of money received up to and including that month)

Table(s) to use: payment

Expected Output:

month	amount	running_total
2007-01-01 00:00:00	4824.43	4824.43
2007-02-01 00:00:00	9631.88	14456.31
2007-03-01 00:00:00	23886.56	38342.87
2007-04-01 00:00:00	28559.46	66902.33
2007-05-01 00:00:00	514.18	67416.51

5 rows

8.4 Write a query to show for each rental both the rental duration and also the average rental duration from the same customer

Solution:

select
customer_id,
rental_id,
return_date - rental_date as rent_duration,
avg(return_date - rental_date) over (partition by customer_id)
from rental;

8.5 Write a query to calculate a running total of payments received, grouped by month (ie. for each month show the total amount of money received that month and also the total amount of money received up to and including that month)

Solution:

We can re-use the monthly_amounts CTE we’ve encountered several times now to give us a baseline table to work with.Then by using the sum function over a window ordered by month ascending, we’re able to take advantage of the default window frame and obtain a running total of payments received.

with monthly_amounts as
(
select
date_trunc(‘month’, payment_date) as month,
sum(amount) as amount
from payment
group by month
)
select
month,
amount,
sum(amount) over (order by month) as running_total
from monthly_amounts;

Challenge exercises

8.6 Write a query to return the top 3 earning films in each rating category. Include ties. To calculate the earnings for a film, multiply the rental rate for the film by the number of times it was rented out

Table(s) to use: rental, inventory, film

Expected Output:

title	rating	income
CAT CONEHEADS	G	149.70
DOGMA FAMILY	G	149.70
SATURDAY LAMBS	G	139.72
BUCKET BROTHERHOOD	PG	169.66
GOODFELLAS SALUTE	PG	154.69
TITANS JERK	PG	144.71
HARRY IDAHO	PG-13	149.70
TRIP NEWTON	PG-13	139.72
ROSES TREASURE	PG-13	139.72
MASSACRE USUAL	R	149.70
BOOGIE AMELIE	R	144.71
CLOSER BANG	R	139.72
SCALAWAG DUCK	NC-17	159.68
ZORRO ARK	NC-17	154.69
APACHE DIVINE	NC-17	154.69
WIFE TURN	NC-17	154.69

16 rows

8.7 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs (you previously did this using the generate_series function. Now do it using only window functions). Note you don’t have to have your output formatted the same.

Table(s) to use: rental

Expected Output:

missing_from	missing_to

     321|       321|
    2247|      2247|
    6579|      6579|
    9426|      9426|
   15592|     15592|

5 rows

8.8 Calculate for each customer the longest amount of time they’ve gone between renting a film

Table(s) to use: rental

Expected Output:

customer_id	longest break

      1| 17 days 14:18:49|
      2| 22 days 09:36:26|
      3| 18 days 16:07:31|
      4| 38 days 16:31:09|
      5|175 days 21:39:01|
      6| 18 days 11:10:58|
      7| 16 days 15:45:13|
      8| 17 days 10:58:56|
      9|177 days 01:22:04|
     10| 17 days 08:40:33|

…

599 rows

8.6 Write a query to return the top 3 earning films in each rating category. Include ties. To calculate the earnings for a film, multiply the rental rate for the film by the number of times it was rented out

Solution:

We can build this query up in several steps. The first CTE calculates the amount of income received from each film by multiplying the rental rate by the number of times the film was rented out. The second CTE assigns a rank for each film based on the rating of the film and income - it also filters out those films without a rating. The final query filters the output to only return the films with a ranking less than or equal to 3.

with film_incomes as
(
select
f.film_id,
f.title,
f.rating,
f.rental_rate * count(*) as income
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
group by f.film_id
),
film_rankings as
(
select
film_id,
title,
rating,
income,
rank() over(partition by rating order by income desc)
from film_incomes
where rating is not null
)
select title, rating, income
from film_rankings
where rank <= 3
order by rating, rank;

8.7 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs (you previously did this using the generate_series function. Now do it using only window functions). Note you don’t have to have your output formatted the same.

Solution:

This exercise falls under a general class of problems known as ‘finding the gap’ - a quite popular question in technical interviews! For each rental, you can use the lead() function to obtain the very next rental ID (when ordered by ID). Any missing IDs therefore are when this “gap” is greater than 1, with current + 1 identifying the first missing ID and next - 1 identifying the last missing ID in the “gap”.

with t as
(
select
rental_id as current,
lead(rental_id) over (order by rental_id) as next
from rental
)
select
current + 1 as missing_from,
next - 1 as missing_to
from t
where next - current > 1;

8.8 Calculate for each customer the longest amount of time they’ve gone between renting a film

Solution:

The days_between CTE is used here to calculate for each rental the time difference between it and the next rental from the same customer. To obtain the longest break for each customer, the final query groups the results by customer and picks the largest such difference.

with days_between as
(
select customer_id, rental_date,
lead(rental_date) over (partition by customer_id order by rental_date) - rental_date as diff
from rental
)
select customer_id, max(diff) as “longest break”
from days_between
group by customer_id
order by customer_id;

Exercises - UNION

9.1 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date

Table(s) to use: rental, payment

Expected Output:

interaction_date

  2005-05-24|
  2005-05-25|
  2005-05-26|
  2005-05-27|
  2005-05-28|
  2005-05-29|
  2005-05-30|
  2005-05-31|

…

81 rows

9.1 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date

Solution:

We can obtain the result we’re after here by unioning the rental date and payment dates (cast to dates to remove the time component), from the rental and payment tables respectively. The union operator will take care of removing any duplicate dates from the output.

(
select cast(rental_date as date) as interaction_date
from rental
)
union
(
select cast(payment_date as date) as interaction_date
from payment
)
order by interaction_date;

Exercises - INTERSECT

9.2 Write a query to find the actors that are also customers (assuming same name = same person)

Table(s) to use: customer, actor

Expected Output:

first_name	last_name
JENNIFER	DAVIS

1 rows

9.3 Have the actors with IDs 49 (Anne Cronyn), 152 (Ben Harris), and 180 (Jeff Silverstone) ever appeared in any films together? Which ones?

Table(s) to use: film_actor

Expected Output:

film_id

729|

1 rows

9.2 Write a query to find the actors that are also customers (assuming same name = same person)

Solution:

By performing an intersection between the first name and last name columns from both the customer and actor tables we obtain the people who appear in both tables.

(
select first_name, last_name
from customer
)
intersect
(
select first_name, last_name
from actor
);

9.3 Have the actors with IDs 49 (Anne Cronyn), 152 (Ben Harris), and 180 (Jeff Silverstone) ever appeared in any films together? Which ones?

Solution:

We can obtain the list of films each actor has appeared in directly from the film_actor table. By performing an intersection between all 3 lists of films, we are left with any films common to each actor.

(
select film_id
from film_actor
where actor_id = 49
)
intersect
(
select film_id
from film_actor
where actor_id = 152
)
intersect
(
select film_id
from film_actor
where actor_id = 180
);

Exercises - EXCEPT

9.4 The missing rental IDs problem that we’ve encountered several times now is the perfect place to use EXCEPT. Write a query using the generate_series function and EXCEPT to find missing rental IDs (The rental table has 16,044 rows but the maximum rental ID is 16,049 - some IDs are missing)

Table(s) to use: rental

Expected Output:

id | -----| 2247| 9426| 15592| 6579| 321| 5 rows

9.5 Write a query to list all the customers who have rented out a film on a Saturday but never on a Sunday. Order the customers by first name.

Table(s) to use: rental, customer

Expected Output:

first_name|last_name| ----------|---------| CHRISTIAN |JUNG | CLIFTON |MALCOLM | EVA |RAMOS | FELIX |GAFFNEY | LEON |BOSTIC | MARION |OCAMPO | ROBIN |HAYES | TIFFANY |JORDAN | 8 rows

>>> SCROLL DOWN FOR SOLUTIONS >>>

9.4 The missing rental IDs problem that we’ve encountered several times now is the perfect place to use EXCEPT. Write a query using the generate_series function and EXCEPT to find missing rental IDs (The rental table has 16,044 rows but the maximum rental ID is 16,049 - some IDs are missing)

Solution:

We first generate a sequential list of numbers ranging from the minimum rental ID (1) in the rental table to the maximum rental ID (16,049). From this, using EXCEPT, we remove the rental IDs present in the rental table. This leaves us with the missing rental IDs. Of all the different ways you’ve now learned to solve this exercise (first subqueries and then window functions) I think this is the most intuitive.

( select t.id from generate_series( (select min(rental_id) from rental), (select max(rental_id) from rental)) as t(id) ) except ( select rental_id from rental );

9.5 Write a query to list all the customers who have rented out a film on a Saturday but never on a Sunday. Order the customers by first name.

Solution:

We first obtain a list of all the customers who have rented out a film on Saturday using the date_part function with ‘isodow’ (ISO Day Of Week) which returns a number between 1 and 7 identifying the day of the week. From this, using EXCEPT, we remove the customers who have made rentals on Sunday.

( select first_name, last_name from rental inner join customer using (customer_id) where date_part(‘isodow’, rental_date) = 6 ) except ( select first_name, last_name from rental inner join customer using (customer_id) where date_part(‘isodow’, rental_date) = 7 ) order by first_name;

Challenge exercises

9.6 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date. Include only one row in the output for each type of interaction

Table(s) to use: rental, payment

Expected Output:

interaction_date	type

  2005-05-24|rental |
  2005-05-25|rental |
  2005-05-26|rental |
  2005-05-27|rental |
  2005-05-28|rental |
  2005-05-29|rental |
  2005-05-30|rental |
  2005-05-31|rental |
  2005-06-14|rental |
  ...
  2007-01-24|payment|
  2007-01-25|payment|
  2007-01-26|payment|
  2007-01-27|payment|
  2007-01-28|payment|
  2007-01-29|payment|
  2007-01-30|payment|
  ...

81 rows

9.7 Write a query to return the countries in which there are both customers and staff. Use a CTE to help simplify your code.

Table(s) to use: staff, customer, address, city, country

Expected Output:

country
Canada

1 rows

9.8 Imagine you had two queries - let’s call them A and B. Can you figure out how you would use set operators to return the rows in either A or B, but not both.

9.6 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date. Include only one row in the output for each type of interaction

Solution:

Similar to the first exercise, we union the rental dates with the payment dates. This time however each query contains a static column describing the type of interaction (this pattern of adding a column to each query describing the source table is quite common when encountering set operators in real-world situations)

(
select cast(rental_date as date) as interaction_date, ‘rental’ as type
from rental
)
union
(
select cast(payment_date as date) as interaction_date, ‘payment’ as type
from payment
)
order by interaction_date;

9.7 Write a query to return the countries in which there are both customers and staff. Use a CTE to help simplify your code.

Solution:

Not such a tricky question in this case but it does involve the same joined table in more than one case so the use of a CTE can be quite handy here instead of having to repeat the same query elements to resolve an address ID in to a country. Yes, you can use CTEs with set operators without any problems!

with address_country as
(
select address_id, country
from address
inner join city using (city_id)
inner join country using (country_id)
)
(
select country
from staff
inner join address_country using (address_id)
)
intersect
(
select country
from customer
inner join address_country using (address_id)
);

9.8 Imagine you had two queries - let’s call them A and B. Can you figure out how you would use set operators to return the rows in either A or B, but not both.

Solution:

To obtain the rows in either A or B but not in both, you could first perform A union B. From this, you would then minus A intersect B. In full then: (A union B) except (A intersect B). Can you think of another way too?

Exercises - Creating tables

10.1 In this and the following exercises in this chapter, we’re going to be doing some lightweight database modelling work for a fictional beach equipment rental business. Your answers may deviate a little from mine as we go, and that’s fine - database design is a quite subjective topic. To kick things off, we’ll keep working with our existing database but we want to create all our tables within a schema called ‘beach’. Write a SQL statement to create the ‘beach’ schema.

10.2 Create a table to store customers. For each customer we want to capture their first name, last name, email address, phone number, and the date the account was created. Don’t worry about primary keys and constraints for now - just focus on the create statement and choosing what you think are appropriate data types for the listed attributes.

Table(s) to use: beach.customers

10.3 Create a table to store details about the equipment to be rented out. For each item, we want to store the type (‘surf board’, ‘kayak’, etc.), a general ad-hoc description of the item (color, brand, condition, etc), and replacement cost so we know what to charge customers if they lose the item.

Table(s) to use: beach.equipment

10.4 After running the business for a while, we notice that customers sometimes lose equipment. Write a SQL statement to alter the equipment table (assume it already has data in it we don’t want to lose) to add a column to track whether the item is missing.

Table(s) to use: beach.equipment

>>> SCROLL DOWN FOR SOLUTIONS >>>

10.1 In this and the following exercises in this chapter, we’re going to be doing some lightweight database modelling work for a fictional beach equipment rental business. Your answers may deviate a little from mine as we go, and that’s fine - database design is a quite subjective topic. To kick things off, we’ll keep working with our existing database but we want to create all our tables within a schema called ‘beach’. Write a SQL statement to create the ‘beach’ schema.

Solution:

create schema beach;

10.2 Create a table to store customers. For each customer we want to capture their first name, last name, email address, phone number, and the date the account was created. Don’t worry about primary keys and constraints for now - just focus on the create statement and choosing what you think are appropriate data types for the listed attributes.

Solution:

The text data type is actually ideal for the email, first name, and last name. When it comes to PostgreSQL, you should typically prefer to use text instead of varchar(n) - there’s no performance difference, and unless you have a good reason to want to set a maximum length just use text (you can also set a maximum length with text via a CHECK constraint - and more too!). For the phone number, you could get more fancy and try to denormalize it further in to a few columns (for international dialing prefix, number, extension, etc.) but it’s awfully complicated and there are many exceptions and edge cases. text is the more flexible approach. The create date can be of type date - we don’t need a time component.

create table beach.customers ( email text, first_name text, last_name text, phone text, create_date date );

10.3 Create a table to store details about the equipment to be rented out. For each item, we want to store the type (‘surf board’, ‘kayak’, etc.), a general ad-hoc description of the item (color, brand, condition, etc), and replacement cost so we know what to charge customers if they lose the item.

Solution:

Both the item type and description can be text. Because the business will be only stocking certain types of items, in a future exercise we’ll investigate ways to restrict the item_type just to particular values. The replacement cost is an interesting one - PostgreSQL actually does actually have a money data type that should in theory be ideal. In practice though, it has a range of problems you can read about online and so numeric is the better choice. I’ve opted for a precision of 7 and scale of 2, which allows storing items of up to value $99,999.99 which seems more than adequate!

create table beach.equipment ( item_type text, description text, replacement_cost numeric(7, 2) );

10.4 After running the business for a while, we notice that customers sometimes lose equipment. Write a SQL statement to alter the equipment table (assume it already has data in it we don’t want to lose) to add a column to track whether the item is missing.

Solution:

You can use the ALTER TABLE statement to add a column to an existing table. In this case, we add the column missing to be of type boolean.

alter table beach.equipment add missing boolean;

Exercises - Primary keys

10.5 Add a surrogate primary key for the customers table using the GENERATED AS IDENTITY syntax (we assume not all customers will provide an email address or phone number ruling them out as potential natural keys). Note you may drop the schema/table and re-create it from scratch.

Table(s) to use: beach.customers

10.6 Add a surrogate primary key for the equipment table using one of the serial types. Also add in to the table definition the ‘missing’ column from exercise 10.4. Note you may drop the schema/table and re-create it from scratch.

Table(s) to use: beach.equipment

10.7 Create a new table to store information about each rental with an appropriate primary key. For each rental, store the customer, the item that was rented, the rental date, and the return date.

Table(s) to use: beach.rentals

10.5 Add a surrogate primary key for the customers table using the GENERATED AS IDENTITY syntax (we assume not all customers will provide an email address or phone number ruling them out as potential natural keys). Note you may drop the schema/table and re-create it from scratch.

Solution:

We introduce a new surrogate key column called customer_id of type bigint (though int would likely be fine too since this is just a small beach rental business). In a challenge exercise at the end of this chapter, we’ll explore what makes GENERATED ALWAYS… different from GENERATED BY DEFAULT…

create table beach.customers (
customer_id bigint generated always as identity primary key,
email text,
first_name text,
last_name text,
phone text,
create_date date
);

10.6 Add a surrogate primary key for the equipment table using one of the serial types. Also add in to the table definition the ‘missing’ column from exercise 10.4. Note you may drop the schema/table and re-create it from scratch.

Solution:

create table beach.equipment (
equipment_id bigserial primary key,
item_type text,
description text,
replacement_cost numeric(7, 2),
missing boolean
);

10.7 Create a new table to store information about each rental with an appropriate primary key. For each rental, store the customer, the item that was rented, the rental date, and the return date.

Solution:

We’ll assume for now that a rental consists of just renting one item (and revisit this in the challenge exercises) so we can define our primary key to be a composite natural key comprising the customer ID, equipment ID, and rental date. The combination of all 3 should be unique - we include the rental date in the key to allow the very plausible scenario of the same customer renting the same item on some other day.

create table beach.rentals (
customer_id bigint,
equipment_id bigint,
rental_date date,
return_date date,
primary key (customer_id, equipment_id, rental_date)
);

Exercises - Foreign keys

10.8 Add appropriate foreign keys to the rentals table. Setup the foreign keys such that if the referenced customer or equipment is deleted, the related entries in the rentals table will also be deleted. Note you may drop the schema/table and re-create it from scratch.

Table(s) to use: beach.rentals

10.8 Add appropriate foreign keys to the rentals table. Setup the foreign keys such that if the referenced customer or equipment is deleted, the related entries in the rentals table will also be deleted. Note you may drop the schema/table and re-create it from scratch.

Solution:

Both the customer ID and equipment ID can be setup as foreign keys with ON DELETE CASCADE specified to ensure deletion of any referenced customers or equipment cascade delete records in the rentals table.

create table beach.rentals (
customer_id bigint references beach.customers (customer_id) on delete cascade,
equipment_id bigint references beach.equipment (equipment_id) on delete cascade,
rental_date date,
return_date date,
primary key (customer_id, equipment_id, rental_date)
);

Exercises - CHECK, UNIQUE, and NOT NULL constraints

10.9 Add appropriate check, unique, not null, and default constraints to the customers table to capture the following requirements: a) A customer must provide both a first name and last name b) A customer must provide at least one contact detail - a phone number or email address c) The create date should be the date the new customer record is inserted in the table d) No two customers should have the same email address or phone number

Table(s) to use: beach.customers

10.10 Add appropriate check, unique, not null, and default constraints to the equipment table to capture the following requirements: a) A newly added item should not be missing! b) Each item must have a type c) The replacement cost can be NULL. But if provided, it must be a positive number

Table(s) to use: beach.equipment

>>> SCROLL DOWN FOR SOLUTIONS >>>

10.9 Add appropriate check, unique, not null, and default constraints to the customers table to capture the following requirements: a) A customer must provide both a first name and last name b) A customer must provide at least one contact detail - a phone number or email address c) The create date should be the date the new customer record is inserted in the table d) No two customers should have the same email address or phone number

Solution:

You shouldn’t have too much problem picking constraints for the above requirements. Two interesting ones worth focusing on are the default create date - in this case, we use the function current_date (oddly enough, a function that you can call without parentheses) to return the current date. Also the check constraint, to make sure that either the email or phone number is not null.

create table beach.customers ( customer_id bigint generated always as identity primary key, email text unique, first_name text not null, last_name text not null, phone text unique, create_date date not null default current_date, check (email is not null or phone is not null) );

10.10 Add appropriate check, unique, not null, and default constraints to the equipment table to capture the following requirements: a) A newly added item should not be missing! b) Each item must have a type c) The replacement cost can be NULL. But if provided, it must be a positive number

Solution:

You might be wondering about the check constraint for replacement_cost and why you didn’t have to explicitly permit null. It’s always important to keep in mind the ‘3 valued’ logic employed by SQL where an expression can evaluate to true, false, or unknown. On inserting a replacement cost of NULL, the comparison “Is NULL >= 0” evaluates to unknown. Check constraints reject values that evaluate to false, but unknown is fair game. Which in this case, is exactly what we want.

create table beach.equipment ( equipment_id bigserial primary key, item_type text not null, description text, replacement_cost numeric(7, 2) check (replacement_cost >= 0), missing boolean not null default false );

Challenge exercises

10.11 Search the official PostgreSQL documentation for information about GENERATED { ALWAYS | BY DEFAULT } and explain in your own words the difference between the two options.

10.12 Imagine you wanted to restrict the equipment types to just a couple of options like ‘Surfboard’, ‘Single Kayak’, ‘Double Kayak’, etc. To do this, you actually have more options at your disposal than you might think. You could add a check constraint to the column. You could create an ENUM. You could create a whole new data type called a DOMAIN. Or you could use a new table in your database to store the available types. Have a read of this article and justify which approach you would take in this case.

10.13 A customer comes in and asks whether there are any Single Kayak’s available for rent. Write a query to return a count of how many Single Kayak’s are available for rent.

Table(s) to use: beach.equipment, beach.rentals

10.14 Right now a rental consists of a customer renting a single item. If it were more common though for a customer to rent multiple items at a time and you wanted to capture these all under a single ‘rental’, how would you modify the existing tables (or create new tables) to model this.

>>> SCROLL DOWN FOR SOLUTIONS >>>

10.11 Search the official PostgreSQL documentation for information about GENERATED { ALWAYS | BY DEFAULT } and explain in your own words the difference between the two options.

Solution:

With BY DEFAULT, PostgreSQL will generate a unique value for the column on insert, but will also allow you to insert your own value if you want to manually take control. With ALWAYS, if you attempt to provide your own value for the column as part of the insert statement, you’ll get an error.

10.12 Imagine you wanted to restrict the equipment types to just a couple of options like ‘Surfboard’, ‘Single Kayak’, ‘Double Kayak’, etc. To do this, you actually have more options at your disposal than you might think. You could add a check constraint to the column. You could create an ENUM. You could create a whole new data type called a DOMAIN. Or you could use a new table in your database to store the available types. Have a read of this article and justify which approach you would take in this case.

Solution:

The approach I would take is to store the types in their own table with the type defined as the primary key. The main reason for preferring this approach is because it would be reasonable to expect new types of equipment to be added over time. When this happens, rather than having to alter constraints, drop the enum type and recreate, etc. and all the other complicated approaches, managing the types in their own table would be very easy with simple INSERT statements. We could also easily produce a list of all the different types available for rent, etc.

create table equipment_types ( type text primary key );

10.13 A customer comes in and asks whether there are any Single Kayak’s available for rent. Write a query to return a count of how many Single Kayak’s are available for rent.

Solution:

In this case, the query is written to perform a count of items with the type ‘Single Kayak’, that are not missing, and also are not actively being rented (where an active rental is defined as a rental with a NULL return date).

select count(*) from beach.equipment where item_type = ‘Single Kayak’ and missing = false and equipment_id not in (select equipment_id from beach.rentals where return_date is null);

10.14 Right now a rental consists of a customer renting a single item. If it were more common though for a customer to rent multiple items at a time and you wanted to capture these all under a single ‘rental’, how would you modify the existing tables (or create new tables) to model this.

Solution:

One approach could be to split out the equipment being rented in to its own table. So as shown below, the rental_details table would capture details about the items, and could contain multiple entries for each rental ID.

create table beach.rentals ( rental_id bigserial primary key, customer_id bigint references beach.customers (customer_id), rental_date date, return_date date ); create table beach.rental_details ( rental_id bigint references beach.rentals (rental_id), equipment_id bigint references beach.equipment (equipment_id), primary key (rental_id, equipment_id) );

Exercises - Importing and exporting data

11.1 In the last video, we imported the data for the users and notes tables from CSV files. Now also import the note tags from CSV.

Table(s) to use: playground.note_tags

11.2 Export in CSV format (with headings) the number of notes created by each user. Order the users in descending order based on how many notes they’ve created.

Table(s) to use: playground.users, playground.notes

Expected Output:

name,note_count
David Hilbert,4
Joseph Fourier,3
Bernhard Riemann,2
Leonhard Euler,1
Blaise Pascal,1

11.3 Output values (any values of your choosing) of the following types to a plain text file: a) text b) int c) numeric d) real e) boolean f) date g) timestamptz h) interval

Expected Output:

test 33 33.3 33.2999992 t 2019-05-12 2019-05-12 15:25:11.644876+10 1 day

11.1 In the last video, we imported the data for the users and notes tables from CSV files. Now also import the note tags from CSV.

Solution:

copy playground.note_tags
from ‘…note_tags.csv’
with (format csv, header true);

11.2 Export in CSV format (with headings) the number of notes created by each user. Order the users in descending order based on how many notes they’ve created.

Solution:

copy
(
select
u.first_name || ’ ’ || u.last_name as name,
count(*) as note_count
from playground.notes as n
inner join playground.users as u
on n.user_email = u.email
group by u.email, u.first_name, u.last_name
order by note_count desc
)
to ‘…note_counts.csv’
with (format csv, header true);

11.3 Output values (any values of your choosing) of the following types to a plain text file: a) text b) int c) numeric d) real e) boolean f) date g) timestamptz h) interval

Solution:

copy
(
select
‘test’::text,
33::int,
33.3::numeric,
33.3::real,
true::boolean,
current_date::date,
current_timestamp::timestamptz,
‘1 day’::interval
)
to ‘…output_types.txt’;

Exercises - Using transactions and inserting data

11.4 Write an insert statement to insert a new customer in to the customer table with any details of your choosing. Use the returning clause to return the inserted row. For this and all upcoming exercises, remember to do this inside a transaction block and rollback the change afterwards!

Table(s) to use: customer

Expected Output:

customer_id	store_id	first_name	last_name	email	address_id	activebool	create_date	last_update	active

    600|       1|John      |Henry    |     |         1|true      | 2019-05-14|2019-05-14 15:21:57|     1|

11.5 Create a new table called rental_stats with two columns, date and num_rentals, to hold the data for the number of rentals made each day. And write an insert statement to populate the table with correct data as calculated from the rental table

Table(s) to use: rental, rental_stats

11.4 Write an insert statement to insert a new customer in to the customer table with any details of your choosing. Use the returning clause to return the inserted row. For this and all upcoming exercises, remember to do this inside a transaction block and rollback the change afterwards!

Solution:

insert into customer(store_id, first_name, last_name, address_id, active)
values
(1, ‘John’, ‘Henry’, 1, 1)
returning *;

11.5 Create a new table called rental_stats with two columns, date and num_rentals, to hold the data for the number of rentals made each day. And write an insert statement to populate the table with correct data as calculated from the rental table

Solution:

We achieve this with two statements - one to create the rental_stats table (don’t worry about PKs) and an insert statement that queries the rental table, calculating the number of rentals made on each day. As an alternative, it is also possible to create a table and populate it with results from a query in a single statement using CREATE TABLE AS - I’ve included below a solution using this approach as well. With CREATE TABLE AS, the created table column names and types come directly from the query.

create table rental_stats (
date date,
num_rentals int
);

insert into rental_stats
select
rental_date::date as rental_day,
count(*)
from rental
group by rental_day
order by rental_day;

create table rental_stats as
select
rental_date::date as rental_day,
count(*)
from rental
group by rental_day
order by rental_day;

Exercises - Updating data

11.6 All customers should have an email address of the form [first_name].[last_name]@sakilacustomer.org (all in lower case). Write an update statement so that all customers have an email address in this format.

Table(s) to use: customer

11.7 Write an update statement to update the rental rate of the 20 most rented films by 10%

Table(s) to use: rental, inventory, film

11.8 Write a script to add a new column to the films table to hold the length of each film in hours (have a look at some of the examples for the ALTER TABLE command) and then populate this new column with the correct values

Table(s) to use: film

>>> SCROLL DOWN FOR SOLUTIONS >>>

11.6 All customers should have an email address of the form [first_name].[last_name]@sakilacustomer.org (all in lower case). Write an update statement so that all customers have an email address in this format.

Solution:

In this case since we can visually observe that none of the customer email addresses are in the correct format, writing a blanket statement to update all the rows is fine. In general though, you are better off limiting your update statements to only update rows that you have to for performance reasons.

update customer set email = lower(first_name || ‘.’ || last_name || ‘@sakilacustomer.org’);

11.7 Write an update statement to update the rental rate of the 20 most rented films by 10%

Solution:

For this update statement, we update the rental_rate for those films that exist in the results of a subquery. The subquery returns the 20 most rented films.

update film set rental_rate = rental_rate * 1.1 where film_id in ( select i.film_id from rental as r inner join inventory as i using (inventory_id) group by i.film_id order by count(*) desc limit 20 );

11.8 Write a script to add a new column to the films table to hold the length of each film in hours (have a look at some of the examples for the ALTER TABLE command) and then populate this new column with the correct values

Solution:

The script first creates the new column length_hrs of type numeric(2, 1) - allowing the storing of lengths like 1.2 hrs, 3.0 hrs, etc. Next the update statement populates the column with correct values, taking care to avoid integer division by dividing by 60.0.

alter table film add column length_hrs numeric(2, 1); update film set length_hrs = length / 60.0 returning *;

Exercises - Deleting data

11.9 Delete all the payments where the payment amount was zero, returning the deleted rows

Table(s) to use: payment

Expected Output:

payment_id	customer_id	staff_id	rental_id	amount	payment_date

 31918|        267|       2|    13713|  0.00|2007-05-14 13:44:29|
 31920|        269|       2|    12610|  0.00|2007-05-14 13:44:29|
 31925|        284|       2|    12959|  0.00|2007-05-14 13:44:29|
 31942|        354|       1|    11782|  0.00|2007-05-14 13:44:29|
 31946|        361|       1|    14769|  0.00|2007-05-14 13:44:29|
 31966|        448|       1|    13577|  0.00|2007-05-14 13:44:29|
 31970|        457|       2|    14516|  0.00|2007-05-14 13:44:29|
 31983|        516|       1|    12915|  0.00|2007-05-14 13:44:29|

…

24 rows

11.10 Delete all the unused languages from the language table

Table(s) to use: language

11.9 Delete all the payments where the payment amount was zero, returning the deleted rows

Solution:

delete from payment
where amount = 0
returning *;

11.10 Delete all the unused languages from the language table

Solution:

This solution makes use of a plain uncorrelated subquery to return all the languages that are in use in the film table and deletes any languages from the language table not in this list.

delete from language
where language_id not in
(select distinct language_id
from film);

Challenge exercises

11.11 Write a single update statement to update the activebool column for customers to be true if they made a rental in 2006 or later, and false otherwise.

Table(s) to use: customer, rental

11.12 Create a new table, with appropriate primary keys and foreign keys, to hold the amount of inventory of each film in each store (store_id, film_id, stock_count). In this table we want to store the stock level for every film in every store - including films that aren’t in stock. Write an “upsert” statement to populate the table with the correct values. By “upsert”, I mean insert a SQL statement that will either insert a new row in the table (ie. a new film, store, stock count) or update the stock count if the film/store attempting to be inserted already exists in the table). Research PostgreSQL’s INSERT ON CONFLICT and look at the examples for some guidance on how to do this.

Table(s) to use: film, store, inventory, inventory_stats (new)

11.13 Write a single statement to delete the first rental made by each customer and to avoid any foreign key violations you’ll also have to delete any associated payments in that same statement. You might need to do some research online to figure this one out. As a hint, you can use Common Table Expressions (CTEs) with delete statements and delete statements themselves can return results with the RETURNING clause!

Table(s) to use: rental, payment

11.14 In the films table the rating column is of type mpaa_rating, which is an ENUM. You’ve read online about the downsides of ENUMs and now want to convert your table design to instead store the different mpaa rating types in a reference table with the type as the primary key. Write a script to create the new table, populate it with data, convert the film table, and then drop the mpaa_rating type so it won’t be used ever again. You’re going to need to Google a few ideas and look up some documentation to get through this one - good luck!

Table(s) to use: film, mpaa_ratings (new)

>>> SCROLL DOWN FOR SOLUTIONS >>>

11.11 Write a single update statement to update the activebool column for customers to be true if they made a rental in 2006 or later, and false otherwise.

Solution:

In this exercise, the update statement must update the value of the activebool column conditionally. If the customer has made rentals in 2006 or later, the value must be set to true. If the customer hasn’t made any rentals in 2006 or later, the value must be set to false. Whenever you come across conditional situations like this, case expressions often come in handy. Here we use a case expression that depends on the result of a correlated subquery that evaluates whether any rentals were made for each customer in 2006 or later.

update customer set activebool = case when exists (select * from rental where rental.customer_id = customer.customer_id and rental_date >= ‘2006-01-01’) then true else false end;

11.12 Create a new table, with appropriate primary keys and foreign keys, to hold the amount of inventory of each film in each store (store_id, film_id, stock_count). In this table we want to store the stock level for every film in every store - including films that aren’t in stock. Write an “upsert” statement to populate the table with the correct values. By “upsert”, I mean insert a SQL statement that will either insert a new row in the table (ie. a new film, store, stock count) or update the stock count if the film/store attempting to be inserted already exists in the table). Research PostgreSQL’s INSERT ON CONFLICT and look at the examples for some guidance on how to do this.

Solution:

This was a quite difficult exercise so give yourself a big pat on the back if you were able to work it out. Starting with the table creation, we create a new table called inventory_stats with a composite primary key consisting of the store_id and film_id. For the INSERT statement the underlying query providing the data to be inserted (or updated) is a cross join between the film and store tables to produce all unique combinations of films and stores, and then left joined with the inventory table to tally up stock levels, including films that aren’t in stock. Finally, the ON CONFLICT clause ensures that if there is a primary key violation on insert, the stock count is still updated. Upsert statements can come in handy in many situations!

create table inventory_stats ( store_id smallint references store (store_id), film_id smallint references film (film_id), stock_count int not null, primary key (store_id, film_id) ); insert into inventory_stats(store_id, film_id, stock_count) select s.store_id, f.film_id, count(i.inventory_id) from film as f cross join store as s left join inventory as i on f.film_id = i.film_id and s.store_id = i.store_id group by f.film_id, s.store_id on conflict (store_id, film_id) do update set stock_count = excluded.stock_count;

11.13 Write a single statement to delete the first rental made by each customer and to avoid any foreign key violations you’ll also have to delete any associated payments in that same statement. You might need to do some research online to figure this one out. As a hint, you can use Common Table Expressions (CTEs) with delete statements and delete statements themselves can return results with the RETURNING clause!

Solution:

Another very tricky exercise. With this exercise, a DISTINCT ON subquery is used to obtain the first rental_id for each customer to be deleted, and the deletion from the rental table takes place within a CTE and returns the IDs that were deleted. Using the CTE results, associated payments are also deleted. Note that since both deletes occur within a single statement we avoid violating the foreign key (even though the rentals are deleted “first”).

with deleted_rentals as ( delete from rental where rental_id in (select distinct on (customer_id) rental_id from rental order by customer_id, rental_date) returning rental_id ) delete from payment where rental_id in (select rental_id from deleted_rentals);

11.14 In the films table the rating column is of type mpaa_rating, which is an ENUM. You’ve read online about the downsides of ENUMs and now want to convert your table design to instead store the different mpaa rating types in a reference table with the type as the primary key. Write a script to create the new table, populate it with data, convert the film table, and then drop the mpaa_rating type so it won’t be used ever again. You’re going to need to Google a few ideas and look up some documentation to get through this one - good luck!

Solution:

After creating the mpaa_ratings table, the different values the ENUM can take on are inserted in to the table. The query to obtain the different enum values for insertion in to the new table is quite strange, making use of several new functions you haven’t seen before…it’s just one of those things you end up Googling when infrequently needed. In the next statement we then make a number of changes to the film table - converting the rating column to type text, dropping and re-adding the default constraint (we have to do this because the column is specified to get a default value of type mpaa_rating), and then adding the foreign key constraint to reference the new table. Finally, we’re able to drop the mpaa_rating type.

create table mpaa_ratings ( rating text primary key ); insert into mpaa_ratings select unnest(enum_range(null::mpaa_rating)); alter table film alter column rating drop default, alter column rating type text, alter column rating set default ‘G’, add foreign key (rating) references mpaa_ratings(rating); drop type mpaa_rating;