Database Primer – Tutorius

A database is a structured collection of information.

This is not a database;

Adam Thomson, 555-6335
Anita McColl, 108 Verdun Ave., 555-5249

But this is;

Adam Thomson,                , 555-6335
Anita McColl, 108 Verdun Ave., 555-5249

The difference? We don't have Adam's address right now, so in the second example we've left room for it. The information (data) is structured in the second example, but not in the first.

Purists will have far more complicated explanations, and so do I, but at the heart of it, that's what it comes down to.

Today, when anyone talks about databases, they're almost certain to be talking about a relational database, often referred to as a Relational Database Management System (DBMS or RDBMS.) We'll cover what makes it relational shortly.

Information in a database is stored, conceptually, in tables of rows and columns, just like a spreadsheet. One row, often called a record, is maintained for each thing or entity we're keeping information about, for instance an employee or a book in a library.

Each record is divided into fields, often called attributes, of information which relates to that entity, such as first name, last name and phone number. It is this rigid structure that makes a database work.

We'll build a concept of a database as we go here, so let's start with what we've covered and imagine that we're going to build a database to run a small hotel.

With what we've seen we can imagine that we have a database containing a single table of records, one for each guest. This includes those who have merely made reservations, those who are in the hotel now, and those who have checked out. This single table is sometimes referred to as a flat-file database.

At this point we'll imagine that each record in our database is keeping all the information about the guest (name and address etc.) as well as information about the visit (arrival and departure dates, room number, etc.)

What happens when a guest comes back for another visit? We'll have to add another record to the database for the second visit.

So far so good. Now let's say that our guest has something from the minibar. Now we have to add that to the database so we can charge him for it. If we add another record to our table, we'll be keeping information about the guest, about the visit and about the item purchased (what it was, how much is cost, etc.)

You can see that our database has quickly become unmanageable. It would be completely impractical to try to maintain the information like this. Nor only would the database size spiral out of control, if the guest changed his room number or home address, every record in the database relating to that guest would have to be changed! There's just no way that could be done without introducing errors and inconsistency over time.

This is where the Relational in Relational Database comes in.

If we want to bring our database under control we need to break it into multiple tables (database people call this decomposing the data) and relate the tables to each other.

To start with, we will create a Guest table for the basic information about the guest. Name, address, room preferences, etc. Now we'll create a Visit table with information about indivudual visit; arrival date, departure date, room number, etc. And of course we'll create a Purchases table tracking the sales outlet (such as the minibar, restaurant or gift shop), the item, the price, etc.

Now we can add only the information we need to each table without duplicating data among them, which is one of the cardinal sins of database management.

The next thing we need to do it find a way to relate records in the three tables to each other (there's that relational thing again.) Before we can do that we need to ensure that as we build these relationships we always know exactly what record we're dealing with. That brings us to a critical piece of database theory (and jargon) called the Primary Key.

Let's say we have Adam Thomson staying with us. Since it's his birthday his son Adam has come to share the occasion with him. Now we have a problem. Once the second Adam Thompson checks in, how are we going to tell them apart? They could live at the same address and have the same phone number.

We resolve this by introducing a Primary Key, a new attribute which is guaranteed to be unique in the table. There are generally two ways to go about this. First, we can find a unique number that is natural to the situation, like a Social Security number in an employee table. The other approach is simpler. Use a consecutive number for each record as it is added. Relational database systems are set up to handle this automatically. The point is that every record absolutely must be unique. By using a Primary Key we can guarantee that. Every table in a relational database has a Primary Key. That much is carved in stone.

Now that we've decomposed out data and implemented our primary keys, it's time to relate our tables.

You'll recall that we saw that it was impractical to keep the information about the guest and the information about the visit in the same table. Our solution was to break the data into separate tables.

Now we'll establish a relationship between the guest and visit tables by adding a new attribute to the visit table. This attribute is called a Foreign Key because it matches the primary key in the Guest table. The DBMS is responsible for ensuring that each Foreign Key entry in the Visit table matches the Primary Key of an entry in the Guest table. With that we've established a tight relationship – there's that word again – between the Guest and Visit tables. In another lesson we'd explore the means by which the DBMS maintains the integrity of that relationship in the event that a record in the guest table is deleted.

By splitting the data out like this, we've reduced the space required to store our database. But there's another significant benefit. If our guest changes his address, we only have to make the change in one place – the corresponding record in the guest table. And that means that there's no chance of contradictory data in the database. This is why duplication of data is considered one of the cardinal sins of database design.

In the Structured Query Language (SQL) tutorial we'll look at how we can query the two tables using the primary and foreign keys to obtain a list of all guests with their corresponding visits.

Normalization

mysql
sql
normalization