Introduction to Relational Databases in SQL

Introduction to Relational Databases in SQL

1. Your first database

</> Attributes of relational databases

In the video, we talked about some basic facts about relational databases. Which of the following statements does not hold true for databases? Relational databases …

  • … store different real-world entities in different tables.
  • … allow to establish relationships between entities.
  • … are called “relational” because they store data only about people.
  • … use constraints, keys and referential integrity in order to assure data quality.

</> Query information_schema with SELECT

Get information on all table names in the current database, while limiting your query to the ‘public’ table_schema.

SELECT table_name 
FROM information_schema.tables
WHERE table_schema = 'public'; # 指定table_schema值

table_name
university_professors

Now have a look at the columns in university_professors by selecting all entries in information_schema.columns that correspond to that table.

SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'university_professors' AND table_schema = 'public';

column_name				data_type
firstname				text
lastname				text
university				text
university_shortname	text
university_city			text
function				text
organization			text
organization_sector		text

How many columns does the table university_professors have?

  • 12
  • 9
  • 8
  • 5

Finally, print the first five rows of the university_professors table.

SELECT * 
FROM university_professors 
LIMIT 5;

firstname		lastname	university		university_shortname	university_city	function											organization					organization_sector
Karl			Aberer		ETH Lausanne	EPF						Lausanne		Chairman of L3S Advisory Board						L3S Advisory Board				Education & research
Karl			Aberer		ETH Lausanne	EPF						Lausanne		Member Conseil of Zeno-Karl Schindler Foundation	Zeno-Karl Schindler Foundation	Education & research
Karl			Aberer		ETH	Lausanne	EPF						Lausanne		Member of Conseil Fondation IDIAP					Fondation IDIAP					Education & research
Karl			Aberer		ETH Lausanne	EPF						Lausanne		Panel Member 										SNF Ambizione Program			Education & research
Reza Shokrollah	Abhari		ETH Zürich		ETH						Zurich			Aufsichtsratsmandat									PNE Wind AG						Energy, environment & mobility

</> CREATE your first few TABLEs

Create a table professors with two text columns: firstname and lastname.

CREATE TABLE professors (
 firstname text,
 lastname text
);

Create a table universities with three text columns: university_shortname, university, and university_city.

CREATE TABLE universities(
 university_shortname text,
 university text,
 university_city text
);

</> ADD a COLUMN with ALTER TABLE

Alter professors to add the text column university_shortname.

ALTER TABLE professors
ADD COLUMN university_shortname text;

</> RENAME and DROP COLUMNs in affiliations

Rename the organisation column to organization in affiliations.

ALTER TABLE affiliations
RENAME COLUMN organisation TO organization;

Delete the university_shortname column in affiliations.

ALTER TABLE affiliations
DROP COLUMN university_shortname;

</> Migrate data with INSERT INTO SELECT DISTINCT

Insert all DISTINCT professors from university_professors into professors.

Print all the rows in professors.

INSERT INTO professors 
SELECT DISTINCT firstname, lastname, university_shortname 
FROM university_professors;

SELECT * 
FROM professors ;

firstname	lastname	university_shortname
Michel		Rappaz		EPF
Hilal		Lashuel		EPF
Jeffrey		Huang		EPF
...
[Showing 100 out of 551 rows]

Insert all DISTINCT affiliations into affiliations from university_professors.

INSERT INTO affiliations 
SELECT DISTINCT firstname, lastname, function, organization 
FROM university_professors;

SELECT * 
FROM affiliations;

firstname	lastname	function										organization
Dimos		Poulikakos	VR-Mandat										Scrona AG
Francesco	Stellacci	Co-editor in Chief, Nanoscale					Royal Chemistry Society, UK
Alexander	Fust		Fachexperte und Coach für Designer Startups		Creative Hub
...
[Showing 100 out of 1377 rows]

</> Delete tables with DROP TABLE

Delete the university_professors table.

DROP TABLE university_professors;

2. Enforce data consistency with attribute constraints

</> Types of database constraints

Which of the following is not used to enforce a database constraint?

  • Foreign keys
  • SQL aggregate functions
  • The BIGINT data type
  • Primary keys

</> Conforming with data types

Execute the given sample code.

INSERT INTO transactions (transaction_date, amount, fee) 
VALUES ('2018-24-09', 5454, '30');

SELECT *
FROM transactions;

date/time field value out of range: "2018-24-09"
LINE 3: VALUES ('2018-24-09', 5454, '30');
                ^
HINT:  Perhaps you need a different "datestyle" setting.

As it doesn’t work, have a look at the error message and correct the statement accordingly – then execute it again.

INSERT INTO transactions (transaction_date, amount, fee) 
VALUES ('2018-09-24', 5454, '30');

SELECT *
FROM transactions;

transaction_date	amount	fee
1999-01-08			500		20
2001-02-20			403		15
2001-03-20			3430	35
...

</> Type CASTs

Execute the given sample code.

SELECT transaction_date, amount + fee AS net_amount 
FROM transactions;

operator does not exist: integer + text
LINE 2: SELECT transaction_date, amount + fee AS net_amount 
                                        ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

As it doesn’t work, add an integer type cast at the right place and execute it again.

SELECT transaction_date, amount + CAST(fee AS integer) AS net_amount 
FROM transactions;

transaction_date	net_amount
1999-01-08			520
2001-02-20			418
2001-03-20			3465
...

</> Change types with ALTER COLUMN

Have a look at the distinct university_shortname values in the professors table and take note of the length of the strings.

SELECT DISTINCT university_shortname
FROM professors;

university_shortname
ETH
UNE
EPF
...

Now specify a fixed-length character type with the correct length for university_shortname.

ALTER TABLE professors
ALTER COLUMN university_shortname
TYPE char(3);

Change the type of the firstname column to varchar(64).

ALTER TABLE professors
ALTER COLUMN firstname
TYPE varchar(64);

</> Convert types USING a function

Run the sample code as is and take note of the error.

ALTER TABLE professors 
ALTER COLUMN firstname 
TYPE varchar(16)

value too long for type character varying(16)

Now use SUBSTRING() to reduce firstname to 16 characters so its type can be altered to varchar(16).

ALTER TABLE professors 
ALTER COLUMN firstname 
TYPE varchar(16)
USING SUBSTRING(firstname FROM 1 FOR 16);

</> Disallow NULL values with SET NOT NULL

Add a not-null constraint for the firstname column.

ALTER TABLE professors 
ALTER COLUMN firstname SET NOT NULL;

Add a not-null constraint for the lastname column.

ALTER TABLE professors 
ALTER COLUMN lastname SET NOT NULL;

</> What happens if you try to enter NULLs?

Execute the following statement:

INSERT INTO professors (firstname, lastname, university_shortname)
VALUES (NULL, 'Miller', 'ETH');

Why does this throw an error?

  • Professors without first names do not exist.
  • Because a database constraint is violated.
  • Error? This works just fine.
  • NULL is not put in quotes.

</> Make your columns UNIQUE with ADD CONSTRAINT

Add a unique constraint to the university_shortname column in universities. Give it the name university_shortname_unq.

ALTER TABLE universities
ADD CONSTRAINT university_shortname_unq UNIQUE(university_shortname);

Add a unique constraint to the organization column in organizations. Give it the name organization_unq.

ALTER TABLE organizations
ADD CONSTRAINT organization_unq UNIQUE(organization);

3. Uniquely identify records with key constraints

</> Get to know SELECT COUNT DISTINCT

First, find out the number of rows in universities.

SELECT COUNT(DISTINCT(university_shortname, university, university_city)) 
FROM universities;

count
11

Then, find out how many unique values there are in the university_city column.

SELECT COUNT(DISTINCT(university_city)) 
FROM universities;

count
9

</> Identify keys with SELECT COUNT DISTINCT

There’s a very basic way of finding out what qualifies for a key in an existing, populated table:

  1. Count the distinct records for all possible combinations of columns. If the resulting number x equals the number of all rows in the table for a combination, you have discovered a superkey.
  2. Then remove one column after another until you can no longer remove columns without seeing the number x decrease. If that is the case, you have discovered a (candidate) key.

The table professors has 551 rows. It has only one possible candidate key, which is a combination of two attributes. You might want to try different combinations using the “Run code” button. Once you have found the solution, you can submit your answer.

SELECT COUNT(DISTINCT(firstname,lastname,university_shortname)) 
FROM professors;

count
551
SELECT COUNT(DISTINCT(firstname,university_shortname)) 
FROM professors;

count
479
SELECT COUNT(DISTINCT(lastname,university_shortname)) 
FROM professors;

count
546

Using the above steps, identify the candidate key by trying out different combination of columns.

SELECT COUNT(DISTINCT(firstname,lastname)) 
FROM professors;

count
551

</> Identify the primary key

Have a look at the example table from the previous video. As the database designer, you have to make a wise choice as to which column should be the primary key.

license_no serial_no make model year
Texas ABC-739 A69352 Ford Mustang 2
Florida TVP-347 B43696 Oldsmobile Cutlass 5
New York MPO-22 X83554 Oldsmobile Delta 1
California 432-TFY C43742 Mercedes 190-D 99
California RSK-629 Y82935 Toyota Camry 4
Texas RSK-629 U028365 Jaguar XJS 4

Which of the following column or column combinations could best serve as primary key?

  • PK = {make}
  • PK = {model, year}
  • PK = {license_no}
  • PK = {year, make}

</> ADD key CONSTRAINTs to the tables

Rename the organization column to id in organizations.

Make id a primary key and name it organization_pk.

ALTER TABLE organizations
RENAME COLUMN organization TO id;

ALTER TABLE organizations
ADD CONSTRAINT organization_pk PRIMARY KEY (id);

Rename the university_shortname column to id in universities.

Make id a primary key and name it university_pk.

ALTER TABLE universities
RENAME COLUMN university_shortname TO id;

-- Make id a primary key
ALTER TABLE universities
ADD CONSTRAINT university_pk PRIMARY KEY(id);

</> Add a SERIAL surrogate key

Add a new column id with data type serial to the professors table.

ALTER TABLE professors 
ADD COLUMN id serial;

Make id a primary key and name it professors_pkey.

ALTER TABLE professors 
ADD CONSTRAINT professors_pkey PRIMARY KEY (id);

Write a query that returns all the columns and 10 rows from professors.

SELECT * FROM professors LIMIT 10;

firstname			lastname		university_shortname	id
Karl				Aberer			EPF						1
Reza Shokrollah		Abhari			ETH						2
Georges				Abou Jaoudé		EPF						3
...

</> CONCATenate columns to a surrogate key

Count the number of distinct rows with a combination of the make and model columns.

SELECT COUNT(DISTINCT(make, model))
FROM cars;

count
10

Add a new column id with the data type varchar(128).

ALTER TABLE cars
ADD COLUMN id varchar(128);

Concatenate make and model into id using an UPDATE table_name SET column_name = … query and the CONCAT() function.

UPDATE cars
SET id = CONCAT(make, model);

Make id a primary key and name it id_pk.

ALTER TABLE cars
ADD CONSTRAINT id_pk PRIMARY KEY(id);

SELECT * FROM cars;

make	model		mpg		id
Subaru	Forester	24		SubaruForester
Opel	Astra		45		OpelAstra
Opel	Vectra		40		OpelVectra
...

</> Test your knowledge before advancing

Given the above description of a student entity, create a table students with the correct column types.

Add a PRIMARY KEY for the social security number ssn.
Note that there is no formal length requirement for the integer column. The application would have to make sure it’s a correct SSN!

CREATE TABLE students (
  last_name varchar(128) NOT NULL,
  ssn integer PRIMARY KEY,
  phone_no char(12)
);

4. Glue together tables with foreign keys

</> REFERENCE a table with a FOREIGN KEY

Rename the university_shortname column to university_id in professors.

ALTER TABLE professors
RENAME COLUMN university_shortname TO university_id;

Add a foreign key on university_id column in professors that references the id column in universities.

Name this foreign key professors_fkey.

ALTER TABLE professors 
ADD CONSTRAINT professors_fkey FOREIGN KEY (university_id) REFERENCES universities (id);

</> Explore foreign key constraints

Run the sample code and have a look at the error message.

INSERT INTO professors (firstname, lastname, university_id)
VALUES ('Albert', 'Einstein', 'MIT');

insert or update on table "professors" violates foreign key constraint "professors_fkey"
DETAIL:  Key (university_id)=(MIT) is not present in table "universities"

What’s wrong? Correct the university_id so that it actually reflects where Albert Einstein wrote his dissertation and became a professor – at the University of Zurich (UZH)!

INSERT INTO professors (firstname, lastname, university_id)
VALUES ('Albert', 'Einstein', 'UZH');

</> JOIN tables linked by a foreign key

JOIN professors with universities on professors.university_id = universities.id, i.e., retain all records where the foreign key of professors is equal to the primary key of universities.

Filter for university_city = ‘Zurich’.

SELECT professors.lastname, universities.id, universities.university_city
FROM professors
JOIN universities
ON professors.university_id = universities.id
WHERE universities.university_city = 'Zurich';

lastname	id		university_city
Abhari		ETH		Zurich
Axhausen	ETH		Zurich
Baschera	ETH		Zurich
...

</> Add foreign keys to the “affiliations” table

Add a professor_id column with integer data type to affiliations, and declare it to be a foreign key that references the id column in professors.

ALTER TABLE affiliations
ADD COLUMN professor_id integer REFERENCES professors (id);

Rename the organization column in affiliations to organization_id.

ALTER TABLE affiliations
RENAME organization TO organization_id;

Add a foreign key constraint on organization_id so that it references the id column in organizations.

ALTER TABLE affiliations
ADD CONSTRAINT affiliations_organization_fkey foreign key (organization_id) REFERENCES organizations (id);

</> Populate the “professor_id” column

First, have a look at the current state of affiliations by fetching 10 rows and all columns.

SELECT * FROM affiliations LIMIT 10;

firstname	lastname	function											organization_id					professor_id
Karl		Aberer		Chairman of L3S Advisory Board						L3S Advisory Board				null
Karl		Aberer		Member Conseil of Zeno-Karl Schindler Foundation	Zeno-Karl Schindler Foundation	null
Karl		Aberer		Member of Conseil Fondation IDIAP					Fondation IDIAP					null
...

Update the professor_id column with the corresponding value of the id column in professors. “Corresponding” means rows in professors where the firstname and lastname are identical to the ones in affiliations.

UPDATE affiliations
SET professor_id = professors.id
FROM professors
WHERE affiliations.firstname = professors.firstname AND affiliations.lastname = professors.lastname;

Check out the first 10 rows and all columns of affiliations again. Have the professor_ids been correctly matched?

SELECT * FROM affiliations LIMIT 10;

firstname	lastname	function						organization_id																				professor_id
Peter		Schneemann	NA								CIHA																						442
Heinz		Zimmermann	Mitglied des Stiftungsrates		Stiftung zur Förderung des Schweizerischen Wirtschaftsarchivs am WWZ der Universität Basel	539
Heinz		Zimmermann	Mitglied des Verwaltungsrates	Remaco AG, Basel																			539
...

</> Drop “firstname” and “lastname”

Drop the firstname and lastname columns from the affiliations table.

ALTER TABLE affiliations
DROP COLUMN firstname;

ALTER TABLE affiliations
DROP COLUMN lastname;

</> Referential integrity violations

Given the current state of your database, what happens if you execute the following SQL statement?

DELETE FROM universities WHERE id = 'EPF';

update or delete on table "universities" violates foreign key constraint "professors_fkey" on table "professors"
DETAIL:  Key (id)=(EPF) is still referenced from table "professors".
  • It throws an error because the university with ID “EPF” does not exist.
  • The university with ID “EPF” is deleted.
  • It fails because referential integrity from universities to professors is violated.
  • It fails because referential integrity from professors to universities is violated.

</> Change the referential integrity behavior of a key

Have a look at the existing foreign key constraints by querying table_constraints in information_schema.

SELECT constraint_name, table_name, constraint_type
FROM information_schema.table_constraints
WHERE constraint_type = 'FOREIGN KEY';

constraint_name						table_name		constraint_type
affiliations_organization_id_fkey	affiliations	FOREIGN KEY
affiliations_professor_id_fkey		affiliations	FOREIGN KEY
professors_fkey						professors		FOREIGN KEY

Delete the affiliations_organization_id_fkey foreign key constraint in affiliations.

ALTER table affiliations
DROP CONSTRAINT affiliations_organization_id_fkey;

Add a new foreign key to affiliations that cascades deletion if a referenced record is deleted from organizations. Name it affiliations_organization_id_fkey.

ALTER TABLE affiliations
ADD CONSTRAINT affiliations_organization_id_fkey FOREIGN KEY(organization_id) REFERENCES organizations(id) ON DELETE CASCADE;

Run the DELETE and SELECT queries to double check that the deletion cascade actually works.

DELETE FROM organizations 
WHERE id = 'CUREM';

SELECT * FROM affiliations
WHERE organization_id = 'CUREM';

</> Count affiliations per university

Count the number of total affiliations by university.

Sort the result by that count, in descending order.

SELECT COUNT(*), professors.university_id 
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
GROUP BY professors.university_id
ORDER BY COUNT DESC;

count	university_id
579		EPF
273		USG
162		UBE
...

</> Join all the tables together

Join all tables in the database (starting with affiliations, professors, organizations, and universities) and look at the result.

SELECT *
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
JOIN organizations
ON affiliations.organization_id = organizations.id
JOIN universities
ON professors.university_id = universities.id;

function							organization_id			professor_id	id	firstname	lastname	university_id	id						organization_sector		id	university		university_city
NA									CIHA					442				442	Peter		Schneemann	UBE				CIHA					Not classifiable		UBE	Uni Bern		Bern
Panel Member 						SNF Ambizione Program	1				1	Karl		Aberer		EPF				SNF Ambizione Program	Education & research	EPF	ETH Lausanne	Lausanne
Member of Conseil Fondation IDIAP	Fondation IDIAP			1				1	Karl		Aberer		EPF				Fondation IDIAP			Education & research	EPF	ETH Lausanne	Lausanne
...
[Showing 100 out of 1377 rows]

Now group the result by organization sector, professor, and university city.

Count the resulting number of rows.

SELECT COUNT(*), organizations.organization_sector, 
professors.id, universities.university_city
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
JOIN organizations
ON affiliations.organization_id = organizations.id
JOIN universities
ON professors.university_id = universities.id
GROUP BY organizations.organization_sector, 
professors.id, universities.university_city;

count	organization_sector		id		university_city
1		Not classifiable		47		Basel
2		Media & communication	361		Saint Gallen
1		Education & research	140		Zurich
...
[Showing 100 out of 929 rows]

Only retain rows with “Media & communication” as organization sector, and sort the table by count, in descending order.

SELECT COUNT(*), organizations.organization_sector, 
professors.id, universities.university_city
FROM affiliations
JOIN professors
ON affiliations.professor_id = professors.id
JOIN organizations
ON affiliations.organization_id = organizations.id
JOIN universities
ON professors.university_id = universities.id
WHERE organizations.organization_sector = 'Media & communication'
GROUP BY organizations.organization_sector, 
professors.id, universities.university_city
ORDER BY COUNT DESC;

count	organization_sector		id		university_city
4		Media & communication	538		Lausanne
3		Media & communication	365		Saint Gallen
3		Media & communication	36		Lausanne
...
发布了11 篇原创文章 · 获赞 0 · 访问量 681

猜你喜欢

转载自blog.csdn.net/weixin_42871941/article/details/104888221
今日推荐