Lab 5. Relational Databases. Introduction to SQL

Introduction to Databases

A database (DB) is a structured set of data that allows the storage of information and efficient processing by multiple users.

A database management system (DBMS) is a software product that ensures interaction with a database, allowing the definition, consultation and updating of data in the database. All requests for access to the database are handled and controlled by the DBMS.

The database schema represents a database structure described in a formal language supported by the DBMS. The term “schema” refers to the organization of data as a blueprint of how the database is constructed (divided into database tables in the case of relational databases)

Relational Databases. Introduction to SQL

Structured Query Language (SQL) is a well-established data language. It is used for managing data in a relational database management system (RDBMS) which stores data and relationships between data in the form of tables.

Inspired by Relational Algebra, the model of relational databases can be considered as a collection of two-dimensional tables, defined by the following concepts:

An entity in relational algebra becomes a table in a database
A table is made up of columns and lines, and the field is a value that is at the intersection of a column and a line
An association relationship is a link between two or more entities.
An attribute is a descriptive property of an entity or relationship.

SQL is a programming language to operate databases. It includes database creation, deletion, fetching rows, modifying rows, etc. There are multiple SQL dialects such as MySQL, Oracle, Microsoft SQL Server, PostgreSQL, SQLite, of which MySQL is the most used by companies, being also open-source and free. You can find more information about SQL at SQL Tutorial - w3resource. Now we will go through the basics.

SQL is used to communicate to a database in a human readable form. How does it look like?

SELECT *
FROM employees AS e
LEFT JOIN salaries AS s
ON e.emp_no = s.emp_no
WHERE e.emp_no = 10004;

Modeling Relational Databases

In a Relational Database, data is represented as tables, defined by rows and columns.
The data or information for the database is stored in these tables.
Columns are defined by the column name, data type and other attributes such as constraints.
Rows contain the records or data that are stored into the table.
A database can have multiple tables for storing different data.
Relationships can exist between tables such as: one-to-one, one-to-many, many-to-many and self-referencing.

SQL Commands

The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT, UPDATE, DELETE and DROP. These commands can be classified into the following groups:

DDL - Data Definition Language

CREATE: create a new table*
ALTER: modify an existing table*
DROP: delete an entire table*

*table, view or other object in the database

DML - Data Manipulation Language

SELECT: retrieves records
INSERT: creates a record
UPDATE: modify records
DELETE: delete records

DCL - Data Control Language

GRANT: gives a privilege to user
REVOKE: revokes a privilege from user

MySQL Setup

For installing MySQL on your computer (task) you can use this tutorial: MySQL Local Setup Tutorial

After completing the steps, you will have MySQL Server and MySQL Workbench GUI installed and ready to use.

Data Definition Language (DDL)

In this section we will focus on working with MySQL Workbench GUI to make things easier when creating and editing databases and tables (DDL). This tutorial will show you a step-by-step guide for creating the database in the figure: MySQL Workbench Tutorial

You may use MySQL Workbench GUI for DDL operations, and keep the commands presented in this section as reference.

CREATE DATABASE (creating a database)

The command for creating a database is CREATE DATABASE (equivalent in MySQL: CREATE SCHEMA):

CREATE DATABASE schema;
CREATE SCHEMA schema;

Syntax description:

schema - the name of the database/schema to be created

CREATE TABLE (creating a table)

A table can be created for an existing database. The command for creating a table is CREATE TABLE:

CREATE TABLE [ schema. ] TABLE_NAME
(
    column_name_1 datatype [DEFAULT VALUE|expression] [inline_constraints] ,
    column_name_2 datatype [DEFAULT VALUE|expression] [inline_constraints] , ....
)

Syntax description:

schema - the name of the database/schema where the table will be created
table_name - the name of the table
column_name_k - the name of the column k
datatype - the data type for a column
DEFAULT VALUE|expression - specifies if a column will have a default value
inline_constraints - a column can have constraints such as: UNIQUE, PRIMARY KEY, NOT NULL

Examples

The following example shows how you can create the company database schema using DDL commands:

CREATE DATABASE company;
USE company;
 
CREATE TABLE department (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(50) DEFAULT NULL,
  location VARCHAR(100) DEFAULT NULL,
  PRIMARY KEY (id)
);
 
CREATE TABLE POSITION (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(50) DEFAULT NULL,
  description VARCHAR(200) DEFAULT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY name_UNIQUE (name)
);
 
CREATE TABLE employee (
  id INT NOT NULL AUTO_INCREMENT,
  department_id INT DEFAULT NULL,
  position_id INT DEFAULT NULL,
  manager_id INT DEFAULT NULL,
  name VARCHAR(50) DEFAULT NULL,
  salary DECIMAL(10,2) DEFAULT NULL,
  hire_date datetime DEFAULT NULL,
  PRIMARY KEY (id),
  KEY fk_employee_department_idx (department_id),
  KEY fk_employee_position_idx (position_id),
  KEY fk_employee_manager_idx (manager_id),
  CONSTRAINT fk_employee_department FOREIGN KEY (department_id) REFERENCES department (id) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT fk_employee_manager FOREIGN KEY (manager_id) REFERENCES employee (id) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT fk_employee_position FOREIGN KEY (position_id) REFERENCES POSITION (id) ON DELETE CASCADE ON UPDATE CASCADE
);

Constraints

Constraints are column properties that have to be achieved by the data at any given moment. The most common constraints are:

NOT NULL - the value of a column must not be NULL
UNIQUE - the value of a column must be unique for every row in the table
PRIMARY KEY - the unique identifier for a row in the table; must not be NULL and must be UNIQUE for every row in the table
FOREIGN KEY - the value of a column is related to another table by referring to the UNIQUE/PRIMARY KEY of that table; makes possible the associations between tables

You can add constraints to an existing table using the ALTER TABLE command:

-- Primary Keys --
ALTER TABLE department ADD CONSTRAINT pk_department PRIMARY KEY (id);
 
-- Foreign Keys --
ALTER TABLE employee ADD CONSTRAINT fk_employee_department FOREIGN KEY (department_id);
 
-- Unique --
ALTER TABLE POSITION ADD CONSTRAINT uq_name UNIQUE (name);

Note: You cannot create a FOREIGN KEY if the referenced table does not exist. You have to create the referenced table first.

Data Manipulation Language (DML)

In SQL, the DML is used to modify data but not the schema or database objects. The operations that are part of DML are similar to CRUD operations (create, read, update, delete) with more advanced queries that can be written in simple language.

INSERT (inserting data)

Data can be inserted into an existing table. The command for inserting data into a table is INSERT:

-- With column specification; Required: columns with constraints (PRIMARY KEY, NOT NULL) and no DEFAULT values --
INSERT INTO TABLE_NAME(column_name_1, column_name_2, ...) VALUES(value_1, value_2, ...)
 
-- Without column specification; Required: all columns, in the same order they have been declared --
INSERT INTO TABLE_NAME VALUES(value_column_1, value_column_2, ...)

Syntax description:

table_name - the name of the table in which to insert data
column_name_k - the name of the column k
value_column_k - the value to be inserted for column k

Examples

The following examples present some INSERT queries that you can run to add data to the company database.

INSERT INTO department VALUES (NULL,'finance','New York'),(NULL,'PR','Paris'),(NULL,'marketing','Geneva'),(NULL,'development','Bucharest');
 
INSERT INTO department(name,location) VALUES ('finance','New York'),('PR','Paris'),('marketing','Geneva'),('development','Bucharest');

SELECT (retrieving data)

The command for retrieving data from a database is SELECT. The data can be requested from an existing table and is retrieved as rows. SELECT is used to specify what to retrieve from the database, while the DBMS handles the execution of the query.

SELECT [* | {column_name_1, column_name_2, ...}]
FROM  TABLE_NAME
[WHERE conditions]

Syntax description:

table_name - the name of the table from which to retrieve data
* - select all columns from a table
column_name_k - the name of the column k
conditions - some additional conditions for data to be retrieved

Examples

The following examples present some SELECT queries that you can run to get data from the company database.

To run the examples, you should add some data into the database. You can use the following script: Company Database File to make it easier to get started.

-- List all employees from the department with id 1 --
SELECT * FROM employee
WHERE department_id = 1;

-- List the employees name and salary. Show only those having the salary between 5000 and 7000 EUR and order them descending by salary --
SELECT name, salary FROM employee
WHERE salary BETWEEN 5000 AND 7000
ORDER BY salary DESC;

The data can also be retrieved from one or more tables and there can be more complex clauses.

Click to learn more

GROUP BY - groups rows sharing a property so that an aggregate function can be applied to each group
HAVING - selects from the groups defined by the GROUP BY clause
ORDER BY - specifies the order in which to return the rows
AS - provides an alias which can be used for tables or columns

SELECT [* | {column_name_1, column_name_2, ..., aggregate_function(expression)}]
FROM  TABLE_NAME AS table_alias
[WHERE conditions]
[GROUP BY column_name_1]
[ORDER BY column_name_1 [ASC | DESC]]

Example:

-- List the employees name and the name of their department --
SELECT emp.name AS employee, dept.name AS department FROM employee AS emp
INNER JOIN department AS dept
ON dept.id = emp.department_id;
 
-- Get the number of analysts for each department --
SELECT department_id, position_id, COUNT(id) AS employee_count FROM employee
WHERE position_id = 3
GROUP BY department_id;

UPDATE (modifying data)

The command for modifying data is UPDATE:

UPDATE TABLE_NAME
SET column_name_1 = new_value_1, column_name_2 = new_value_2, ... 
[WHERE conditions]

Syntax description:

table_name - the name of the table from which to modify data
column_name_k - the name of the column k
new_value_k - the value to be set (updated) for column k
conditions - some additional conditions for data to be updated

Examples

-- Change the salary for an employee by its name --
SET SQL_SAFE_UPDATES = 0;
 
UPDATE employee
SET salary = 8000
WHERE name = "Larry Code";

DELETE (removing data)

The command for removing data is DELETE:

DELETE FROM TABLE_NAME [WHERE conditions];

Syntax description:

table_name - the name of the table from which to remove data
conditions - some additional conditions for data to be removed

Examples

-- Delete the "marketing" department --
DELETE FROM department
WHERE name = "marketing";

Because we declared the FOREIGN KEY constraints with the CASCADE option for UPDATE and DELETE, the employees will also be removed when removing the department. This is useful to keep the data consistent. Take for example the case where we remove a position (from the positions table). It doesn't make sense to keep the employees for the position that we want to remove, so this would be a good use case for the CASCADE constraint, as we don't have to worry about inconsistencies in the database. The other options are: RESTRICT, SET NULL, NO ACTION.

Case Study - SoundCloud

Here is an example database schema that describes a song database with songs, albums, bands and artists. We are interested in a discussion about the way it is designed and the relationships between tables, to better understand the design concepts.

Each song may be a part of an album (one-to-many relationship between songs and albums), and may belong to a category (one-to-many relationship between songs and categories)
Each artist may have multiple songs, while a song can have multiple artists (many-to-many relationship between songs and artists with link table composers)
Each artist can be in a single band (one-to-many relationship between artists and bands), bands can have multiple albums (one-to-many relationship between bands and albums), and albums can have multiple songs (one-to-many relationship between albums and songs).

When designing a database, it's important to make sure that the relationships are well defined for the use case it's being designed for. Modifying the database schema can be a difficult task when it already contains large amounts of data, so it's important to get it right from the beginning.

NOTE: In this example, the one-to-many relationship between artists and band can be an incomplete definition as in reality an artist can be in multiple bands. The same can be an issue with the one-to-many relationship between songs and albums, as a song may sometimes belong to multiple albums.

Task

T0 (1p) Evaluate the example database (case study) and describe the relationships between the following tables: albums, songs, artists, categories.

T1 (5p) Create the example database in MySQL Workbench. See MySQL Local Setup Tutorial for installing MySQL on your computer and MySQL Workbench Tutorial for working with MySQL Workbench and creating the example database (company database). You can check the example presented in the DDL - CREATE TABLE section. Take a screenshot to validate this step.

T2 (2p) Run the SQL script (Company Database File) to add data into the database. Take a screenshot to validate this step.

T3 (2p) Run 5 of the SQL queries (your choice) from the MySQL Workbench Tutorial (as shown below) and save the results for each query as screenshots or CSV files.

Click to view

-- Q1. Write a query in SQL to display all the information of the employees --
SELECT * FROM employee;
 
-- Q2. List the employees name and salary --
-- Show only those having the salary between 5000 and 7000 EUR and order them descending by salary --
SELECT name, salary FROM employee
WHERE salary BETWEEN 5000 AND 7000
ORDER BY salary DESC;
 
-- Q3. List the employees name and the name of their department --
SELECT emp.name AS employee, dept.name AS department FROM employee AS emp
INNER JOIN department AS dept
ON dept.id = emp.department_id;
 
-- Q4. Get the number of employees in the company --
SELECT COUNT(id) AS employee_count FROM employee;
 
-- Q5. Get the number of employees FROM department id 4 --
SELECT department_id, COUNT(id) AS employee_count FROM employee
WHERE department_id = 4
GROUP BY department_id;
 
-- Q6. Get the number of employees for each department --
SELECT department_id, COUNT(id) AS employee_count FROM employee
GROUP BY department_id;
 
-- Q7. Get the average salary for each department --
SELECT department_id, AVG(salary) AS average_salary FROM employee
GROUP BY department_id;
 
-- Q8. Get the number of analysts FROM department id 3 --
SELECT department_id, position_id, COUNT(id) AS employee_count FROM employee
WHERE department_id = 3 AND position_id = 3;
 
-- Q9. Get the number of analysts for each department --
SELECT department_id, position_id, COUNT(id) AS employee_count FROM employee
WHERE position_id = 3
GROUP BY department_id;
 
-- Q10. List the employees name and the name of their manager --
SELECT manager.name AS manager_name, emp.name AS employee_name FROM employee AS emp
INNER JOIN employee AS manager ON emp.manager_id = manager.id
ORDER BY manager_name ASC;
 
-- Q11. List the employees name, salary, the name of their position --
SELECT emp.name AS employee, pos.name AS POSITION, emp.salary FROM employee AS emp
INNER JOIN POSITION AS pos
ON pos.id = emp.position_id;
 
-- Q12. List the employees name, salary, the name of their department and position --
SELECT emp.name AS employee, dept.name AS department, pos.name AS POSITION, emp.salary 
FROM employee AS emp
INNER JOIN department AS dept
ON dept.id = emp.department_id
INNER JOIN POSITION AS pos
ON pos.id = emp.position_id;