2503ICT: Data modelling and data management


Under construction!

Database management systems

This lecture provides a rapid introduction to database design, database management systems and a database query languages. Two standard references are:

Why use database management systems

Examples of (relational) database management systems (RDBMS)

Open soure or free

Commercial

Non-relational systems

There is an emerging body of opinion that relational databases are no longer suitable for high-transaction access to very large volumes of data by Web applications, but at present the vast majority of data used by Web applications is still stored in relational databases.

(Relational) database design

The relational database model described here was most strongly advocated and studied by E.J. Codd of IBM in the early 1970s, for which achievement he was given the Turing award.

Basic concepts

Integrity constraints

Integrity constraints attempt to ensure that data is both internally consistent and consistent with the real world being modelled. A DBMS should automatically enforce such constraints.

Example 1

Suppose we wish to store information about student enrolments in courses as follows:

Enrolments(CourseCode, CourseName, StudentNumber, StudentName, StudentAddress, Year, Grade)
Here, we store all information about courses and students and enrolments in a single table. For each year (the course is offered) there is one tuple for each course-student pair.

This design has the following problems:

There must be a better way. To find it, requires a deeper understanding of database integrity constraints.

Boyce-Codd normal form

In practice, BCNF is the most important normal form and all tables used should be in BCNF (unless you're an expert and can demonstrate experimentally that another design is more efficient.)

Transforming a table that is not in BCNF to one that is in BCNF usually requires the introduction of additional tables, as in the following example.

Example 1 (continued)

Consider again the relational scheme from Example 1:

Enrolments(CourseCode, CourseName, StudentNumber, StudentName, StudentAddress, Year, Grade)

The most likely primary key of table Enrolments is {CourseCode, StudentNumber, Year} (because a student may enroll in the same course in different years). But this table is not in BCNF because {CourseCode} determines {CourseName} where {CourseCode} is not a superkey. Similarly, {StudentNumber} determines {StudentName, StudentAddress} where {StudentNumber} is not a superkey.

The correct database scheme requires separate tables for courses, students and enrolments. Here, we underline the primary key of each table:

Courses(CourseCode, CourseName)
Students(StudentNumber, StudentName, StudentAddress)
Enrolments(CourseCode, StudentNumber, Year, Grade)

You can check for yourself that each table in this design is in BCNF and that the design does not have any of the problems of the original design.

Note that attribute CourseCode in table Enrolments is a foreign key reference to key CourseCode in table Courses, and that attribute StudentNumberin table Enrolments is similarly a foreign key reference to key StudentNumber in table Students.

Modelling one-one, one-many and many-many relationships

Example 2

Suppose you want to model a stock warehouse enterprise involving stock (items in the warehouse), customers (who may want to order items) and orders (of items by customers). Clearly, there is a many-many relationship between stock items and customers, so a separate table to model this relationship (Orders) is required.

It's good practice, especially on computers, to introduce a separate (auto-incremented) integer identifier field (Id) for each table.

The resulting BCNF database design, with primary keys underlined, is the following:

Stock(Id, Name, Quantity, Price, Description)
Customers(Id, Name, Address, Email)
Orders(Id, ItemId, CustId, Date, Quantity)

Here, attribute ItemId in table Orders is a foreign key reference to key id in table Stock, and attribute CustId in table Orders is a foreign key reference to to key id in table Customers. Note that {ItemId, CustId, Date} is also a key for table Orders (provided a customer can only order the same item once per day).

Object-oriented database design

Almost everything described above also applies to object-oriented database design!

In object-oriented databse design, we uses classes instead of tables and class instances (or objects) instead of table rows. Each class should describe a set of similar entities or a set of similar relationships.

Although the references to objects in a class could serve as a primary key, it is conventional to always use an integer id field to serve as the primary key of the class.

Exactly the same kinds of constraints that apply to relational tables (domain constraints, entity integrity constraints, key constraints, referential integrity constraints) must also hold in objection-oriented classes.

Object-oriented classes should also, normally, be in Boyce-Codd Normal Form. If they are not, additional classes (with foreign keys) need to be introduced.

One-one, one-many and many-many relationships are also modelled identically. In particular, many-many relationships between two classes require a third class, with foreign keys to the first two classes, to be represented.

We will see many examples of these principles in Ruby on Rails applications!

Relational data management (in SQL)

History

Types (domains)

Different implementations of SQL provide subtly different sets of types.

Common SQLite types

Integer, non-null, primary key fields are auto-incremented.

Common MySQL types

Defining attributes in MySQL (and SQLite)

Creating tables in MySQL

Here is an SQL file for creating the tables for Example 2 above.

CREATE TABLE Stock (
    Id INT NOT NULL auto_increment,
    Name VARCHAR(20) DEFAULT '' NOT NULL UNIQUE,
    Quantity INT DEFAULT '0' NOT NULL,
    Price DECIMAL(8,2) NOT NULL,
    Description TEXT,
    PRIMARY KEY (Id));

CREATE TABLE Customers (
    Id INT NOT NULL AUTO_INCREMENT,
    Name VARCHAR(20) DEFAULT '' NOT NULL UNIQUE,
    Address VARCHAR(80),
    Email VARCHAR(30),
    PRIMARY KEY (Id));

CREATE TABLE Orders (
    Id INT NOT NULL AUTO_INCREMENT,
    ItemId INT NOT NULL, 
    CustId INT NOT NULL,
    OrderDate DATE,
    Quantity INT DEFAULT '0',
    PRIMARY KEY (Id),
    FOREIGN KEY (ItemId) 
        REFERENCES Stock(Id),
    FOREIGN KEY (CustId) 
        REFERENCES Customer(Id));

Be consistent in your table and attribute naming conventions! (It's conventional to use Id (or id) for internal table keys.)

To create tables using the InnoDB storage engine:

CREATE TABLE Stock (
    ...
    ENGINE = InnoDB);

In any single application, all tables should use the same storage engine.

User interfaces

SQLite

.help
.databases
.tables
.schema tablename 
.indices tablename
.dump tablename ...
.show
.read filename
SQL statement;
.quit

SQL statements must be terminated by a semi-colon.

MySQL command-line interface

help
show databases;
show tables; (in a selected database)
describe tablename;
show index from tablename;
status
use database
source filename
SQL statement;
quit

SQL statements must be terminated by a semi-colon.

To dump the structure and contents of one or more database tables into a text file (cf. .dump in SQLite), use the Linux command mysqldump as follows:

$ mysqldump --skip-opt -h host -u user -ppassword -d database [tablename ...] > filename

Note that some constraint information may be lost when you do this.

MySQL Web-based interface

http://dwarf.ict.griffith.edu.au/phpMyAdmin/

SQL definition files

Never type SQL "create table" statements directly into a command-line interpreter. Never use the Web-based interface to create tables.

Always type SQL "create table" statements into an SQL definition file using a text editor and source or import the resulting file into the database.

Usually type SQL "insert into" statements into the same definitions file.

Inserting tuples into tables in MySQL

Single-tuple insertion:

INSERT INTO Stock(Id, Name, Quantity, Price, Description)
VALUES (NULL, "Marcel's Morsels", 1500, 1.25, "Delectable, delicious delicacies");

Multiple-tuple insertion:

INSERT INTO Stock(Id, Name, Quantity, Price, Description)
VALUES 
(NULL, "Marcel's Morsels", 1500, 1.25,"Delectable delicious delicacies"),
(NULL, "Fred's Fries", 1000, 0.75, "Fred's Fabulous French Fries");

General tuple insertion:

insert into Stock(Id, Name, Quantity, Price, Description)
  select * 
  from OldStock 
  where Price < 100.0;

Querying data in MySQL

The standard query form in SQL is as follows:

select values
from tables
where conditions

Addition clauses such as group by or order by may follow the where clause.

We illustrate the possibilities by example. You can try out these examples with sample data provided in either SQLite or MySQL.

1. Find the name and email address of all customers whose address contains the string "Nathan".

select Name, Email from Customers 
where Address like "%Nathan%";

If we wanted the less useful query "... whose address is Nathan", we would write:

where Address = "Nathan"

If we wanted to return all attributes, we would write:

select * from Customers

2. Find the item id, customer id and quantity of all orders for items whose price is less than $10.

select ord.ItemId, ord.CustId, ord.Quantity 
from Orders ord, Stock item
where ord.ItemId = item.Id
and item.Price < 10;

An important property of SQL is compositionality (though it is not well designed in SQL). This is the property that the result of one query may be used as input to another query. So we can also express this query as follows.

select ord.ItemId, ord.CustId, ord.Quantity
from Orders ord 
where ord.ItemId in (select item.id from Stock item where item.price < 10);

3. Find the names and addresses of customers who have ordered items whose price is less than $10, ordered by customer name.

This more realistic query asks for customers' names and addresses which are meaningful to users instead of customers' ids which are only meaningful to computers.

select cust.Name, cust.Address 
from Customers cust, Orders ord, Stock item
where cust.Id = ord.CustId and ord.ItemId = item.Id and item.Price < 10.0
order by cust.Name;

4. Find the names and item quantities of the 10 customers who have ordered the greatest quantities of items, ordered by item quantity.

select cust.Name, sum(ord.Quantity) 
from Customers cust, Orders ord
where cust.Id = ord.CustId
group by cust.Id
order by sum(ord.quantity) desc
limit 0, 10;

Exercise. Rewrite Queries 3 and 4 using nested "select" statements.

Deleting and updating data in MySQL

1. Delete the Orders table from the database.

drop table Orders;

2. Delete all tuples from the Orders table (and retain the now empty table).

delete from Orders;

3. Delete all customers whose name is "John Smith".

delete from Customers
where Name = "John Smith";

4. Double the quantity of all orders for customers whose id is greater than 100.

update Orders
set Quantity = 2*Quantity
where CustId > 100;

5. Change the name and address of the customer with id 15.

update Customers
set Name = "John", Address = "Logan"
where Id = 15;

(Normally queries shouldn't use explicit values for internal Id attributes.)

These queries just scratch the surface of what's possible. Fortunately, only simple queries are required in most Web applications.

When testing your application, it's useful (and important) to test your SQL queries from the command-line (or Web-based) interface.

In production, queries are sent to the database from the server-side (PHP) program using the DMS's API.

Object-oriented data management

Unlike the relational database world, there is no widely accepted standard for managing object-oriented data. There have been proposals for object-oriented query languages, but none have come into widespread use.

In practice, most object-oriented data is accesed by a functional interface. These differ from system to system. For example, in Ruby on Rails, one can define classes using the command rails generate model ... and can create, retrieve, update and delete instaces of a model using commands such as the following.

user = User.new(name: "Rodney", email: "rwt@gu.edu.au").save
user = User.create(name: "Rodney", email: "rwt@gu.edu.au")
user = User.find(1) # finds user with id=1
user = User.find_by(name: "Rodney")
users = User.all
users = User.where("name = ?", "Rodney")
user.update(name: "Dave")
user.destroy

Object-relational mapping

Sometimes, as in Ruby on Rails, object-oriented data (at the programming level) is represented in relational database (at the physical level). Similarly, the functional interface to object-oriented data is implemented by translating each function call into an equivalent SQL statement.

This whole process of mapping object-oriented classes into equivalent relational database tables and mapping object-oriented method calls into equivalent SQL statements is called object-relational mapping.

Most modern Web application frameworks, which use object-oriented programming languages to implement the application logic, use object-relational mappings for data management.

Tuning database performance

This is a very complex issue. Some possible steps are:

Additional factors arise in tuning performance for a particular DBMS such as MySQL.

But remember, in this course, clarity and good design is always more important than (micro-)efficiency!

Database transactions

This is a huge topic.

See "Active Record Transactions" in the Ruby on Rails documentation.

To be completed...