Introduction to Relational Databases in SQL
1. Your first database
</> Attributes of relational databases
In the video, we talked about some basic facts about relational databases. Which of the following statements does not hold true for databases? Relational databases …
- … store different real-world entities in different tables.
- … allow to establish relationships between entities.
- … are called “relational” because they store data only about people.
- … use constraints, keys and referential integrity in order to assure data quality.
</> Query information_schema with SELECT
Get information on all table names in the current database, while limiting your query to the ‘public’ table_schema.
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'; # 指定table_schema值
table_name
university_professors
Now have a look at the columns in university_professors by selecting all entries in information_schema.columns that correspond to that table.
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'university_professors' AND table_schema = 'public';
column_name data_type
firstname text
lastname text
university text
university_shortname text
university_city text
function text
organization text
organization_sector text
How many columns does the table university_professors have?
- 12
- 9
- 8
- 5
Finally, print the first five rows of the university_professors table.
SELECT *
FROM university_professors
LIMIT 5;
firstname lastname university university_shortname university_city function organization organization_sector
Karl Aberer ETH Lausanne EPF Lausanne Chairman of L3S Advisory Board L3S Advisory Board Education & research
Karl Aberer ETH Lausanne EPF Lausanne Member Conseil of Zeno-Karl Schindler Foundation Zeno-Karl Schindler Foundation Education & research
Karl Aberer ETH Lausanne EPF Lausanne Member of Conseil Fondation IDIAP Fondation IDIAP Education & research
Karl Aberer ETH Lausanne EPF Lausanne Panel Member SNF Ambizione Program Education & research
Reza Shokrollah Abhari ETH Zürich ETH Zurich Aufsichtsratsmandat PNE Wind AG Energy, environment & mobility
</> CREATE your first few TABLEs
Create a table professors with two text columns: firstname and lastname.
CREATE TABLE professors (
firstname text,
lastname text
);
Create a table universities with three text columns: university_shortname, university, and university_city.
CREATE TABLE universities(
university_shortname text,
university text,
university_city text
);
</> ADD a COLUMN with ALTER TABLE
Alter professors to add the text column university_shortname.
ALTER TABLE professors
ADD COLUMN university_shortname text;
</> RENAME and DROP COLUMNs in affiliations
Rename the organisation column to organization in affiliations.
ALTER TABLE affiliations
RENAME COLUMN organisation TO organization;
Delete the university_shortname column in affiliations.
ALTER TABLE affiliations
DROP COLUMN university_shortname;
</> Migrate data with INSERT INTO SELECT DISTINCT
Insert all DISTINCT professors from university_professors into professors.
Print all the rows in professors.
INSERT INTO professors
SELECT DISTINCT firstname, lastname, university_shortname
FROM university_professors;
SELECT *
FROM professors ;
firstname lastname university_shortname
Michel Rappaz EPF
Hilal Lashuel EPF
Jeffrey Huang EPF
...
[Showing 100 out of 551 rows]
Insert all DISTINCT affiliations into affiliations from university_professors.
INSERT INTO affiliations
SELECT DISTINCT firstname, lastname, function, organization
FROM university_professors;
SELECT *
FROM affiliations;
firstname lastname function organization
Dimos Poulikakos VR-Mandat Scrona AG
Francesco Stellacci Co-editor in Chief, Nanoscale Royal Chemistry Society, UK
Alexander Fust Fachexperte und Coach für Designer Startups Creative Hub
...
[Showing 100 out of 1377 rows]
</> Delete tables with DROP TABLE
Delete the university_professors table.
DROP TABLE university_professors;
2. Enforce data consistency with attribute constraints
</> Types of database constraints
Which of the following is not used to enforce a database constraint?
- Foreign keys
- SQL aggregate functions
- The BIGINT data type
- Primary keys
</> Conforming with data types
Execute the given sample code.
INSERT INTO transactions (transaction_date, amount, fee)
VALUES ('2018-24-09', 5454, '30');
SELECT *
FROM transactions;
date/time field value out of range: "2018-24-09"
LINE 3: VALUES ('2018-24-09', 5454, '30');
^
HINT: Perhaps you need a different "datestyle" setting.
As it doesn’t work, have a look at the error message and correct the statement accordingly – then execute it again.
INSERT INTO transactions (transaction_date, amount, fee)
VALUES ('2018-09-24', 5454, '30');
SELECT *
FROM transactions;
transaction_date amount fee
1999-01-08 500 20
2001-02-20 403 15
2001-03-20 3430 35
...
</> Type CASTs
Execute the given sample code.
SELECT transaction_date, amount + fee AS net_amount
FROM transactions;
operator does not exist: integer + text
LINE 2: SELECT transaction_date, amount + fee AS net_amount
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
As it doesn’t work, add an integer type cast at the right place and execute it again.
SELECT transaction_date, amount + CAST(fee AS integer) AS net_amount
FROM transactions;
transaction_date net_amount
1999-01-08 520
2001-02-20 418
2001-03-20 3465
...
</> Change types with ALTER COLUMN
Have a look at the distinct university_shortname values in the professors table and take note of the length of the strings.
SELECT DISTINCT university_shortname
FROM professors;
university_shortname
ETH
UNE
EPF
...
Now specify a fixed-length character type with the correct length for university_shortname.
ALTER TABLE professors
ALTER COLUMN university_shortname
TYPE char(3);
Change the type of the firstname column to varchar(64).
ALTER TABLE professors
ALTER COLUMN firstname
TYPE varchar(64);
</> Convert types USING a function
Run the sample code as is and take note of the error.
ALTER TABLE professors
ALTER COLUMN firstname
TYPE varchar(16)
value too long for type character varying(16)
Now use SUBSTRING() to reduce firstname to 16 characters so its type can be altered to varchar(16).
ALTER TABLE professors
ALTER COLUMN firstname
TYPE varchar(16)
USING SUBSTRING(firstname FROM 1 FOR 16);
</> Disallow NULL values with SET NOT NULL
Add a not-null constraint for the firstname column.
ALTER TABLE professors
ALTER COLUMN firstname SET NOT NULL;
Add a not-null constraint for the lastname column.
ALTER TABLE professors
ALTER COLUMN lastname SET NOT NULL;
</> What happens if you try to enter NULLs?
Execute the following statement:
INSERT INTO professors (firstname, lastname, university_shortname)
VALUES (NULL, 'Miller', 'ETH');
Why does this throw an error?
- Professors without first names do not exist.
- Because a database constraint is violated.
- Error? This works just fine.
- NULL is not put in quotes.
</> Make your columns UNIQUE with ADD CONSTRAINT
Add a unique constraint to the university_shortname column in universities. Give it the name university_shortname_unq.
ALTER TABLE universities
ADD CONSTRAINT university_shortname_unq UNIQUE(university_shortname);
Add a unique constraint to the organization column in organizations. Give it the name organization_unq.
ALTER TABLE organizations
ADD CONSTRAINT organization_unq UNIQUE(organization);
3. Uniquely identify records with key constraints
</> Get to know SELECT COUNT DISTINCT
First, find out the number of rows in universities.
SELECT COUNT(DISTINCT(university_shortname, university, university_city))
FROM universities;
count
11
Then, find out how many unique values there are in the university_city column.
SELECT COUNT(DISTINCT(university_city))
FROM universities;
count
9
</> Identify keys with SELECT COUNT DISTINCT
There’s a very basic way of finding out what qualifies for a key in an existing, populated table:
- Count the distinct records for all possible combinations of columns. If the resulting number x equals the number of all rows in the table for a combination, you have discovered a superkey.
- Then remove one column after another until you can no longer remove columns without seeing the number x decrease. If that is the case, you have discovered a (candidate) key.
The table professors has 551 rows. It has only one possible candidate key, which is a combination of two attributes. You might want to try different combinations using the “Run code” button. Once you have found the solution, you can submit your answer.
SELECT COUNT(DISTINCT(firstname,lastname,university_shortname))
FROM professors;
count
551
SELECT COUNT(DISTINCT(firstname,university_shortname))
FROM professors;
count
479
SELECT COUNT(DISTINCT(lastname,university_shortname))
FROM professors;
count
546
Using the above steps, identify the candidate key by trying out different combination of columns.
SELECT COUNT(DISTINCT(firstname,lastname))
FROM professors;
count
551
</> Identify the primary key
Have a look at the example table from the previous video. As the database designer, you have to make a wise choice as to which column should be the primary key.
license_no | serial_no | make | model | year |
---|---|---|---|---|
Texas ABC-739 | A69352 | Ford | Mustang | 2 |
Florida TVP-347 | B43696 | Oldsmobile | Cutlass | 5 |
New York MPO-22 | X83554 | Oldsmobile | Delta | 1 |
California 432-TFY | C43742 | Mercedes | 190-D | 99 |
California RSK-629 | Y82935 | Toyota | Camry | 4 |
Texas RSK-629 | U028365 | Jaguar | XJS | 4 |
Which of the following column or column combinations could best serve as primary key?
- PK = {make}
- PK = {model, year}
- PK = {license_no}
- PK = {year, make}
</> ADD key CONSTRAINTs to the tables
Rename the organization column to id in organizations.
Make id a primary key and name it organization_pk.
ALTER TABLE organizations
RENAME COLUMN organization TO id;
ALTER TABLE organizations
ADD CONSTRAINT organization_pk PRIMARY KEY (id);
Rename the university_shortname column to id in universities.
Make id a primary key and name it university_pk.
ALTER TABLE universities
RENAME COLUMN university_shortname TO id;
-- Make id a primary key
ALTER TABLE universities
ADD CONSTRAINT university_pk PRIMARY KEY(id);
</> Add a SERIAL surrogate key
Add a new column id with data type serial to the professors table.
ALTER TABLE professors
ADD COLUMN id serial;
Make id a primary key and name it professors_pkey.
ALTER TABLE professors
ADD CONSTRAINT professors_pkey PRIMARY KEY (id);
Write a query that returns all the columns and 10 rows from professors.
SELECT * FROM professors LIMIT 10;
firstname lastname university_shortname id
Karl Aberer EPF 1
Reza Shokrollah Abhari ETH 2
Georges Abou Jaoudé EPF 3
...
</> CONCATenate columns to a surrogate key
Count the number of distinct rows with a combination of the make and model columns.
SELECT COUNT(DISTINCT(make, model))
FROM cars;
count
10
Add a new column id with the data type varchar(128).
ALTER TABLE cars
ADD COLUMN id varchar(128);
Concatenate make and model into id using an UPDATE table_name SET column_name = … query and the CONCAT() function.
UPDATE cars
SET id = CONCAT(make, model);
Make id a primary key and name it id_pk.
ALTER TABLE cars
ADD CONSTRAINT id_pk PRIMARY KEY(id);
SELECT * FROM cars;
make model mpg id
Subaru Forester 24 SubaruForester
Opel Astra 45 OpelAstra
Opel Vectra 40 OpelVectra
...
</> Test your knowledge before advancing
Given the above description of a student entity, create a table students with the correct column types.
Add a PRIMARY KEY for the social security number ssn.
Note that there is no formal length requirement for the integer column. The application would have to make sure it’s a correct SSN!
CREATE TABLE students (
last_name varchar(128) NOT NULL,
ssn integer PRIMARY KEY,
phone_no char(12)
);
4. Glue together tables with foreign keys
</> REFERENCE a table with a FOREIGN KEY
Rename the university_shortname column to university_id in professors.
ALTER TABLE professors
RENAME COLUMN university_shortname TO university_id;
Add a foreign key on university_id column in professors that references the id column in universities.
Name this foreign key professors_fkey.
ALTER TABLE professors
ADD CONSTRAINT professors_fkey FOREIGN KEY (university_id) REFERENCES universities (id);
</> Explore foreign key constraints
Run the sample code and have a look at the error message.
INSERT INTO professors (firstname, lastname, university_id)
VALUES ('Albert', 'Einstein', 'MIT');
insert or update on table "professors" violates foreign key constraint "professors_fkey"
DETAIL: Key (university_id)=(MIT) is not present in table "universities"
What’s wrong? Correct the university_id so that it actually reflects where Albert Einstein wrote his dissertation and became a professor – at the University of Zurich (UZH)!
INSERT INTO professors (firstname, lastname, university_id)
VALUES ('Albert', 'Einstein', 'UZH');
</> JOIN tables linked by a foreign key
JOIN professors with universities on professors.university_id = universities.id, i.e., retain all records where the foreign key of professors is equal to the primary key of universities.
Filter for university_city = ‘Zurich’.
SELECT professors.lastname, universities.id, universities.university_city
FROM professors
JOIN universities
ON professors.university_id = universities.id
WHERE universities.university_city = 'Zurich';
lastname id university_city
Abhari ETH Zurich
Axhausen ETH Zurich
Baschera ETH Zurich
...
</> Add foreign keys to the “affiliations” table
Add a professor_id column with integer data type to affiliations, and declare it to be a foreign key that references the id column in professors.
ALTER TABLE affiliations
ADD COLUMN professor_id integer REFERENCES professors (id);
Rename the organization column in affiliations to organization_id.
ALTER TABLE affiliations
RENAME organization TO organization_id;
Add a foreign key constraint on organization_id so that it references the id column in organizations.
ALTER TABLE affiliations
ADD CONSTRAINT affiliations_organization_fkey foreign key (organization_id) REFERENCES organizations (id);
</> Populate the “professor_id” column
First, have a look at the current state of affiliations by fetching 10 rows and all columns.
SELECT * FROM affiliations LIMIT 10;
firstname lastname function organization_id professor_id
Karl Aberer Chairman of L3S Advisory Board L3S Advisory Board null
Karl Aberer Member Conseil of Zeno-Karl Schindler Foundation Zeno-Karl Schindler Foundation null
Karl Aberer Member of Conseil Fondation IDIAP Fondation IDIAP null
...
Update the professor_id column with the corresponding value of the id column in professors. “Corresponding” means rows in professors where the firstname and lastname are identical to the ones in affiliations.
UPDATE affiliations
SET professor_id = professors.id
FROM professors
WHERE affiliations.firstname = professors.firstname AND affiliations.lastname = professors.lastname;
Check out the first 10 rows and all columns of affiliations again. Have the professor_ids been correctly matched?
SELECT * FROM affiliations LIMIT 10;
firstname lastname function organization_id professor_id
Peter Schneemann NA CIHA 442
Heinz Zimmermann Mitglied des Stiftungsrates Stiftung zur Förderung des Schweizerischen Wirtschaftsarchivs am WWZ der Universität Basel 539
Heinz Zimmermann Mitglied des Verwaltungsrates Remaco AG, Basel 539
...
</> Drop “firstname” and “lastname”
Drop the firstname and lastname columns from the affiliations table.
ALTER TABLE affiliations
DROP COLUMN firstname;
ALTER TABLE affiliations
DROP COLUMN lastname;
</> Referential integrity violations
Given the current state of your database, what happens if you execute the following SQL statement?
DELETE FROM universities WHERE id = 'EPF';
update or delete on table "universities" violates foreign key constraint "professors_fkey" on table "professors"
DETAIL: Key (id)=(EPF) is still referenced from table "professors".
- It throws an error because the university with ID “EPF” does not exist.
- The university with ID “EPF” is deleted.
- It fails because referential integrity from universities to professors is violated.
- It fails because referential integrity from professors to universities is violated.
</> Change the referential integrity behavior of a key
Have a look at the existing foreign key constraints by querying table_constraints in information_schema.
SELECT constraint_name, table_name, constraint_type
FROM information_schema.table_constraints
WHERE constraint_type = 'FOREIGN KEY';
constraint_name table_name constraint_type
affiliations_organization_id_fkey affiliations FOREIGN KEY
affiliations_professor_id_fkey affiliations FOREIGN KEY
professors_fkey professors FOREIGN KEY
Delete the affiliations_organization_id_fkey foreign key constraint in affiliations.
ALTER table affiliations
DROP CONSTRAINT affiliations_organization_id_fkey;
Add a new foreign key to affiliations that cascades deletion if a referenced record is deleted from organizations. Name it affiliations_organization_id_fkey.
ALTER TABLE affiliations
ADD CONSTRAINT affiliations_organization_id_fkey FOREIGN KEY(organization_id) REFERENCES organizations(id) ON DELETE CASCADE;
Run the DELETE and SELECT queries to double check that the deletion cascade actually works.
DELETE FROM organizations
WHERE id = 'CUREM';
SELECT * FROM affiliations
WHERE organization_id = 'CUREM';
</> Count affiliations per university
Count the number of total affiliations by university.
Sort the result by that count, in descending order.
SELECT COUNT(*), professors.university_id
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
GROUP BY professors.university_id
ORDER BY COUNT DESC;
count university_id
579 EPF
273 USG
162 UBE
...
</> Join all the tables together
Join all tables in the database (starting with affiliations, professors, organizations, and universities) and look at the result.
SELECT *
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
JOIN organizations
ON affiliations.organization_id = organizations.id
JOIN universities
ON professors.university_id = universities.id;
function organization_id professor_id id firstname lastname university_id id organization_sector id university university_city
NA CIHA 442 442 Peter Schneemann UBE CIHA Not classifiable UBE Uni Bern Bern
Panel Member SNF Ambizione Program 1 1 Karl Aberer EPF SNF Ambizione Program Education & research EPF ETH Lausanne Lausanne
Member of Conseil Fondation IDIAP Fondation IDIAP 1 1 Karl Aberer EPF Fondation IDIAP Education & research EPF ETH Lausanne Lausanne
...
[Showing 100 out of 1377 rows]
Now group the result by organization sector, professor, and university city.
Count the resulting number of rows.
SELECT COUNT(*), organizations.organization_sector,
professors.id, universities.university_city
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
JOIN organizations
ON affiliations.organization_id = organizations.id
JOIN universities
ON professors.university_id = universities.id
GROUP BY organizations.organization_sector,
professors.id, universities.university_city;
count organization_sector id university_city
1 Not classifiable 47 Basel
2 Media & communication 361 Saint Gallen
1 Education & research 140 Zurich
...
[Showing 100 out of 929 rows]
Only retain rows with “Media & communication” as organization sector, and sort the table by count, in descending order.
SELECT COUNT(*), organizations.organization_sector,
professors.id, universities.university_city
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
JOIN organizations
ON affiliations.organization_id = organizations.id
JOIN universities
ON professors.university_id = universities.id
WHERE organizations.organization_sector = 'Media & communication'
GROUP BY organizations.organization_sector,
professors.id, universities.university_city
ORDER BY COUNT DESC;
count organization_sector id university_city
4 Media & communication 538 Lausanne
3 Media & communication 365 Saint Gallen
3 Media & communication 36 Lausanne
...