This lecture provides a rapid introduction to database design, database management systems and a database query languages. Two standard references are:
www.sqlite.org)
db.apache.org/derby/)
www.mysql.com)
www.postgres.org)
There is an emerging body of opinion that relational databases are no longer suitable for high-transaction access to very large volumes of data by Web applications, but at present the vast majority of data used by Web applications is still stored in relational databases.
The relational database model described here was most strongly advocated and studied by E.J. Codd of IBM in the early 1970s, for which achievement he was given the Turing award.
Integrity constraints attempt to ensure that data is both internally consistent and consistent with the real world being modelled. A DBMS should automatically enforce such constraints.
Domain constaints. Each attribute can only take values from its specified domain.
Entity integrity constraints. No attribute in a key of a relation can be null.
Key constraints. No two tuples in a relation can have the same value for their keys. I.e., every key for a relation uniquely identifies a tuple in the relation.
Referential integrity constraints. Informally, you can't refer to something that doesn't exist. So if an attribute A of a tuple in a relation R refers to a key B of a relation S, there must exist a tuple in relation S with that key.
More formally, if attribute A in relation R has the same domain as attribute B in relation S, and B is a key (or component of a key) for S, then whenever t1 is a tuple in R with t[A] = x, then there must exist a tuple t2 in S with t2[B] = x.
The attribute A of R that refers to the key B of S is called a foreign key.
Domain-specific constraints.
Suppose we wish to store information about student enrolments in courses as follows:
Enrolments(CourseCode, CourseName, StudentNumber, StudentName, StudentAddress, Year, Grade)Here, we store all information about courses and students and enrolments in a single table. For each year (the course is offered) there is one tuple for each course-student pair.
This design has the following problems:
There must be a better way. To find it, requires a deeper understanding of database integrity constraints.
A superkey of a table R is a superset of the attributes in a key of R.
A table R is in Boyce-Codd normal form (BCNF) if the following condition holds:
If {A1,...,An} -> {B1,...,Bm} is a functional dependency in R, then either {B1,...,Bm} is a subset of {A1,...,An} or {A1,...,An} is a superkey for R.
I.e., every nontrivial determinant in R is a superkey for R.
In practice, BCNF is the most important normal form and all tables used should be in BCNF (unless you're an expert and can demonstrate experimentally that another design is more efficient.)
Transforming a table that is not in BCNF to one that is in BCNF usually requires the introduction of additional tables, as in the following example.
Consider again the relational scheme from Example 1:
Enrolments(CourseCode, CourseName, StudentNumber, StudentName, StudentAddress, Year, Grade)
The most likely primary key of table Enrolments is
{CourseCode, StudentNumber, Year} (because a
student may enroll in the same course in different years). But this table
is not in BCNF because {CourseCode} determines {CourseName} where
{CourseCode} is not a superkey. Similarly, {StudentNumber}
determines {StudentName, StudentAddress} where {StudentNumber} is not a
superkey.
The correct database scheme requires separate tables for courses, students and enrolments. Here, we underline the primary key of each table:
Courses(CourseCode, CourseName) Students(StudentNumber, StudentName, StudentAddress) Enrolments(CourseCode, StudentNumber, Year, Grade)
You can check for yourself that each table in this design is in BCNF and that the design does not have any of the problems of the original design.
Note that attribute CourseCode in table Enrolments is a foreign key
reference to key CourseCode in table Courses, and that attribute
StudentNumberin table Enrolments is similarly a foreign key reference to
key StudentNumber in table Students.
One-one relationships. If there is a one-one relationship between entities in tables R and S, then either table R must contain a foreign key reference to table S, or vice versa.
One-many relationships. If there is a one-many relationship between entities in tables R and S (i.e., for every entity in table R there may be many entities in table S), then table S must contain a foreign key reference to table R. E.g., if a blog article has many comments, then there must be an attribute A in each comment to the article to which it refers. Here, A is a foreign key reference to the article key. (It is a serious error to have an attribute in each article to its list of comments; you just can't do this in the relational model!)
Many-many relationships. If there is a many-many relationship between entities in tables R and S, e.g., between courses and students, then you must introduce a third (relationship) table with foreign key references to both tables R and S.
Suppose you want to model a stock warehouse enterprise involving stock
(items in the warehouse), customers (who may want to order items) and
orders (of items by customers). Clearly, there is a many-many
relationship between stock items and customers, so a separate table to
model this relationship (Orders) is required.
It's good practice, especially on computers, to introduce a separate
(auto-incremented) integer identifier field (Id) for each table.
The resulting BCNF database design, with primary keys underlined, is the following:
Stock(Id, Name, Quantity, Price, Description) Customers(Id, Name, Address, Email) Orders(Id, ItemId, CustId, Date, Quantity)
Here, attribute ItemId in table Orders is a foreign key
reference to key id in table Stock, and attribute
CustId in table Orders is a foreign key reference to to
key id in table Customers. Note that {ItemId,
CustId, Date} is also a key for table Orders
(provided a customer can only order the same item once per day).
Different implementations of SQL provide subtly different sets of types.
Integer, non-null, primary key fields are auto-incremented.
int, not int(n).
Here is an SQL file for creating the tables for Example 2 above.
CREATE TABLE Stock (
Id INT NOT NULL auto_increment,
Name VARCHAR(20) DEFAULT '' NOT NULL UNIQUE,
Quantity INT DEFAULT '0' NOT NULL,
Price DECIMAL(8,2) NOT NULL,
Description TEXT,
PRIMARY KEY (Id));
CREATE TABLE Customers (
Id INT NOT NULL AUTO_INCREMENT,
Name VARCHAR(20) DEFAULT '' NOT NULL UNIQUE,
Address VARCHAR(80),
Email VARCHAR(30),
PRIMARY KEY (Id));
CREATE TABLE Orders (
Id INT NOT NULL AUTO_INCREMENT,
ItemId INT NOT NULL,
CustId INT NOT NULL,
OrderDate DATE,
Quantity INT DEFAULT '0',
PRIMARY KEY (Id),
FOREIGN KEY (ItemId)
REFERENCES Stock(Id),
FOREIGN KEY (CustId)
REFERENCES Customer(Id));
Be consistent in your table and attribute naming conventions! (It's
conventional to use Id (or id) for internal table
keys.)
To create tables using the InnoDB storage engine:
CREATE TABLE Stock (
...
ENGINE = InnoDB);
In any single application, all tables should use the same storage engine.
.help .databases .tables .schema tablename .indices tablename .dump tablename ... .show .read filename SQL statement; .quit
SQL statements must be terminated by a semi-colon.
help show databases; show tables; (in a selected database) describe tablename; show index from tablename; status use database source filename SQL statement; quit
SQL statements must be terminated by a semi-colon.
To dump the structure and contents of one or more database tables into a
text file (cf. .dump in SQLite), use the Linux command
mysqldump as follows:
$ mysqldump --skip-opt -h host -u user -ppassword -d database [tablename ...] > filename
Note that some constraint information may be lost when you do this.
http://dwarf.ict.griffith.edu.au/phpMyAdmin/
Never type SQL "create table" statements directly into a command-line interpreter. Never use the Web-based interface to create tables.
Always type SQL "create table" statements into an SQL definition file using a text editor and source or import the resulting file into the database.
Usually type SQL "insert into" statements into the same definitions file.
Single-tuple insertion:
INSERT INTO Stock(Id, Name, Quantity, Price, Description) VALUES (NULL, "Marcel's Morsels", 1500, 1.25, "Delectable, delicious delicacies");
Id is given the next available integer value.
Id.
Name, already exists.
Id is auto-incremented, the attribute
name Id and the attribute value NULL may be
omitted from the above statement.
Multiple-tuple insertion:
INSERT INTO Stock(Id, Name, Quantity, Price, Description) VALUES (NULL, "Marcel's Morsels", 1500, 1.25,"Delectable delicious delicacies"), (NULL, "Fred's Fries", 1000, 0.75, "Fred's Fabulous French Fries");
General tuple insertion:
insert into Stock(Id, Name, Quantity, Price, Description) select * from OldStock where Price < 100.0;
Stock may be inserted into the table.
Missing attributes have NULL or default values. See below for descriptions
of SQL queries.
The standard query form in SQL is as follows:
select values from tables where conditions
Addition clauses such as group by or order by may follow
the where clause.
We illustrate the possibilities by example. You can try out these examples with sample data provided in either SQLite or MySQL.
1. Find the name and email address of all customers whose address contains the string "Nathan".
select Name, Email from Customers where Address like "%Nathan%";
If we wanted the less useful query "... whose address is Nathan", we would write:
where Address = "Nathan"
If we wanted to return all attributes, we would write:
select * from Customers
2. Find the item id, customer id and quantity of all orders for items whose price is less than $10.
select ord.ItemId, ord.CustId, ord.Quantity from Orders ord, Stock item where ord.ItemId = item.Id and item.Price < 10;
ord and item are variables whose values
range over the tuples in tables Orders and Stock,
respectively.
order because
that's an SQL reserved word. (SQL has a lot of reserved words.)
ord.ItemId with the key item.ID to access the item's
price.
An important property of SQL is compositionality (though it is not well designed in SQL). This is the property that the result of one query may be used as input to another query. So we can also express this query as follows.
select ord.ItemId, ord.CustId, ord.Quantity from Orders ord where ord.ItemId in (select item.id from Stock item where item.price < 10);
3. Find the names and addresses of customers who have ordered items whose price is less than $10, ordered by customer name.
This more realistic query asks for customers' names and addresses which are meaningful to users instead of customers' ids which are only meaningful to computers.
select cust.Name, cust.Address from Customers cust, Orders ord, Stock item where cust.Id = ord.CustId and ord.ItemId = item.Id and item.Price < 10.0 order by cust.Name;
4. Find the names and item quantities of the 10 customers who have ordered the greatest quantities of items, ordered by item quantity.
select cust.Name, sum(ord.Quantity) from Customers cust, Orders ord where cust.Id = ord.CustId group by cust.Id order by sum(ord.quantity) desc limit 0, 10;
limit 0, 10 clause is a MySQL extension: 0 is the
offset from the start of result, 10 is the (maximum) number of tuples
of the result to be returned.
Exercise. Rewrite Queries 3 and 4 using nested "select" statements.
1. Delete the Orders table from the database.
drop table Orders;
2. Delete all tuples from the Orders table (and retain the now empty table).
delete from Orders;
3. Delete all customers whose name is "John Smith".
delete from Customers where Name = "John Smith";
4. Double the quantity of all orders for customers whose id is greater than 100.
update Orders set Quantity = 2*Quantity where CustId > 100;
5. Change the name and address of the customer with id 15.
update Customers set Name = "John", Address = "Logan" where Id = 15;
(Normally queries shouldn't use explicit values for internal
Id attributes.)
These queries just scratch the surface of what's possible. Fortunately, only simple queries are required in most Web applications.
When testing your application, it's useful (and important) to test your SQL queries from the command-line (or Web-based) interface.
In production, queries are sent to the database from the server-side (PHP) program using the DMS's API.
This is a very complex issue. Some possible steps are:
varchar(12), text and blob.
where conditions.
Additional factors arise in tuning performance for a particular DBMS such as MySQL.
But remember, in this course, clarity and good design is more important than (micro-)efficiency!