A brief Introduction to Database Theory ver1 4

Version 1.4
A brief introduction
to
database theory
Written by Keld Marple
Page 1 Keld Marple, February 2010
Version 1.4
Table of Contents
Introduction..........................................................................................................................................3
A database is a program...................................................................................................................3
A database is a concept....................................................................................................................3
Data..................................................................................................................................................4
What do you use them for?..............................................................................................................4
The relational database.........................................................................................................................5
Uniqueness...........................................................................................................................................5
The primary key..........................................................................................................................5
On choosing a good primary key................................................................................................6
What's in a database?............................................................................................................................6
Entity................................................................................................................................................6
Attribute...........................................................................................................................................6
Tuples...............................................................................................................................................7
Values..............................................................................................................................................7
Data types........................................................................................................................................7
Number datatypes.......................................................................................................................7
Integer.....................................................................................................................................7
Float and double.....................................................................................................................8
Text datatypes (strings)...............................................................................................................8
Char........................................................................................................................................8
Varchar...................................................................................................................................8
Text.........................................................................................................................................8
Date, time, year...........................................................................................................................8
NULL..........................................................................................................................................9
Relationships........................................................................................................................................9
Foreign keys.....................................................................................................................................9
The one-to-many relationship.....................................................................................................9
The many-to-many relationship................................................................................................10
Design issues......................................................................................................................................12
The data model ..............................................................................................................................12
E/R-diagram (logical)....................................................................................................................13
Cardinality.................................................................................................................................13
Physical E/R diagram.....................................................................................................................16
Normalisation.................................................................................................................................16
1. normal form...........................................................................................................................17
2. normal form...........................................................................................................................17
3. normal form...........................................................................................................................17
Redundancy...................................................................................................................................18
Data dictionary...............................................................................................................................18
Page 2 Keld Marple, February 2010
Version 1.4
Introduction
A digital database is a system in which data can be stored and retrieved.
In most cases where you want a system to remember something you will need a database. If you
create a webpage with a member registration, you will need to store some information about the
registered members. These information must be stored somewhere and the best solution by far is a
database.
There are different database systems (DBMS) available. One of the biggest systems is Oracle,
which is an expensive high-end DBMS. Another one is MySQL, which is an open source DBMS
and free of charge for the user. MySQL is very popular as the DBMS used by dynamic web pages.
It is a very efficient database and it works well with the scripting language php.
When a system which needs a database has to be created, it is crucial that the database model is well
designed. The database model is the layout of the tables, and therefore all the tables should be
included in the model. A model is a small part of the real world. The model describes that part
sufficiently. The model describes nothing but the things relevant to one particular system: The
system you're currently designing. If a system needs to store some information about a member, the
table could have a username, an email and a password. In reality these three information surely does
not descibe a person satisfactory, but in this particular system it suffices.
The database model must be well structured for several reasons. The DBMS guarantees one thing: It
will store your data and find them again - IF you follow the specified rules for database design.
Another good reason is to avoid redundancy (= dual representation of data).
A database is a program
Just like anything else on a computer the database is software: A program that you install on your
hard disk and that has to be running for you to use it. I doesn't matter if the database program has
been installed on your own computer or another computer located far away as long as you have the
appropriate network software to connect to the database.
A database is a concept
The idea of storing data is not a new one. Neither is the idea of a database. People have figured
ways of storing items, information and even knowledge in many different ways throughout the
human history. But storing information isn't enough. Not if there is a lot of it. If you need only to
handle the names of your kids or your siblings, it is (for most people) sufficient to use the memory
conveniently located in the brain, but when it comes to all the people in a country (and their phone
numbers), we need some storing device, that we can use to find the information we want. Telephone
books are used for that. But the printed telephone book has one big disadvantage: You cannot list
all the entries alphabetically and by number at the same time. So the phone book is good if you
know the name and want to find the phone number, but it is utterly useless if you want to find a
name from a number. To solve this problem we could print another version of the same data, this
time listed in numerical order. But what if you want to search for people with certain professions,
people from the same street or the same part of town? And what if all you know is the first name of
that person?
This is where the digital database comes to the rescue. When using a digital database you are able
Page 3 Keld Marple, February 2010
Version 1.4
to store your data in such a way that every piece of data is written once, but can be used in many
different ways.
This paper is only concerning the digital database.
Data
As suggested above, many things can be considered data. Any information you would like to store
for later use can be data. If you think, that you might one day want to be able to make a log of your
customers' whereabouts, you would have to store their actions somehow you would need to store
some data.
If you sell used cars, and you want to keep track of your business, you could create a database and
type in everything you want to store about your cars and your business. These information are data.
Knowledge can also be thought of as data, but it is harder to define what knowledge is. Often
people prefer to think of knowledge as the thing you achive from combining information. In that
respect, we only store information in our digital database. It is when we use it, we change it into
knowledge.
What do you use them for?
Now for the big question: Is it really necessary if you are creating a small website? To be honest:
Yes it is. If you're doing anything but an online business card, you will almost certainly enhance the
flexibility and efficiency of the site and you will cut down the hours spend on updating.
Here's a few examples:
Website Without database With database
An internet page about your hobby You could have a site showing The same as without, but now you
could be stamps. pictures of all your stamps put into a could include different or even
category like nationality. Whenever multiple categories (like year printed,
you get a new stamp you'll upload the condition, motive). When you acquire
picture and edit the html-page for that a new stamp, you upload the picture
country to include the new image. and include it in your database. It
instantly works on all pages.
A weblog. You write your thoughts every day in You write your daily comments
a text editor of your choice. Then you directly into a web form. People can
upload that file using a predefined both browse and search your texts.
file-name. Remember to either delete You can include search criteria like
or rename any old files. The website headlines, subjects etc. You can also
shows the newest one (e.i. The one see how many times your article have
with the right filename). People can been read.
browse your daily comments if you
provide link to older pages.
A small business website: A used cars You have big lumps of information on You can include user comments and
salesman. the website including pictures of cars. auctions. And naturally uploading and
New cars come in daily, and you maintaining the horde of information
update by uploading pages and images gets much easier
in new pages using templates.
Page 4 Keld Marple, February 2010
Version 1.4
The relational database
The most common type of database is the relational database. The database is called relational
because data is ordered in relations (that is: tables). Each table is meant to describe one thing or
category, like a car for instance. Everything you need to know in relation to a car will be part of the
car-table. So the model, car-colour, number-of-doors, motor-volume etc. are all properties related to
a car. Even though different cars can have different colours, each car will have one colour.
Put in a schematic way, you could write it like this:
MODEL: Opel Kadett MODEL: Ford Mondeo MODEL: Nissan Primera
COLOUR: Red COLOUR: Blue COLOUR: Blue
DOORS: 2 DOORS 5 DOORS: 4
VOLUME: 1400 cm3 VOLUME: 1600 cm3 VOLUME: 1400 cm3
As you see, we have 3 cars registered in this list. They are different in model and they all have the
properties listed. The actual value of the property it could be the value blue of the property
colour does not make sense on its own. None of the properties do. To use the information
provided it is necessary to relate the values with the actual car. The relation between the actual car
and the properties is what have given name to the relational database.
NOTE: You might have heard of relations between tables and then you might think that it is these
relations that gave name to the relational database. This is a common mistake. To be precise, we
call this type of relations for relationships.
We will discuss the relationships between tables later on.
Uniqueness
Now let's say we add one car to the garage. A blue Ford Mondeo with 5 doors and a 1600 cm3
motor. Then the list looks like this:
MODEL: Opel Kadett MODEL: Ford Mondeo MODEL: Nissan Primera MODEL: Ford Mondeo
COLOUR: Red COLOUR: Blue COLOUR: Blue COLOUR: Blue
DOORS: 2 DOORS 5 DOORS: 4 DOORS 5
VOLUME: 1400 cm3 VOLUME: 1600 cm3 VOLUME: 1400 cm3 VOLUME: 1600 cm3
As you might have noticed, there are two identical cars in the list. It is no strange matter in the
garage, but it's bad news in a database. One of the things to achieve when using databases is the
uniqueness of every single item present.
T h e p r i m a r y k e y
This problem leads us to a very important concept: The primary key. The primary key is one
information or attribute, that must be guaranteed unique. In this way every new entry will be unique
- even if several of the values are similar to those of other entries. Whenever you are searching for a
specific row, you are sure that you can always address it precisely using it's primary key.
Page 5 Keld Marple, February 2010
Version 1.4
O n c h o o s i n g a g o o d p r i m a r y k e y
There are at least three different approaches when choosing a primary key.
The first and perhaps most obvious way, is to look through the attributes and find one which is
some sort of unique value already. The personal registration number is such a value. No two
persons can have the same personal registration number. The ISBN for books is also a unique value,
that only applies to one title. So unless you are doing a registration of specific volumes, the ISBN
will be a suitable primary key.
Sometimes you don't have such a unique attribute at hand. A movie database can suffer from this, if
all you have is a title, a plot description, production year and so on. In such cases an identifying
attribute is created. The only purpose of this attribute is to act as primary key. One can even ask the
database system to take care of the numbering of the primary key. Each time someone adds a row to
the database, the database system will make sure that the next number in line will be added as the
primary key of this row.
The third way one can create a primary key is to choose two or more attributes that together create a
guaranteed unique combination.
When an attribute is defined as a primary key, the database will make sure, that nobody insert more
rows with the same primary key value.
What's in a database?
To make sure that everybody knows what everybody's talking about, certain words have come into
play during the years. To be completely honest, I might add, that not everybody uses all the words
in exactly the same way, and they don't always use them consistently and unambiguously. But one
has to start somewhere when you know your way around the database theory, you will not get
confused because someone uses a different word. Here is some of the important words and their
meaning.
Entity
The first thing we create, when we develop a database model is the entities. Entities are things or
ideas from the physical world, that we need in our model. A database with members could choose
person as an entity. Or even member . The entity is meant as an abstraction of what it
represents. Is is naturally not the thing itself. It is only a representation of the thing. And it is
only meant to represent the thing or idea in one particular model. Therefore we might include things
like name , address and telephone number in our entity called member , even though another
database designer doing another database might include login , password and email in his
member-entity.
Often you will see the words table and relation as more or less synonymous with the word
entity. Developers use the word entity when designing the model and the word table when referring
to the actual tables that have been created in the database.
Attribute
When we have found an entity for our database model, we must describe it in detail. This is done by
specifying exactly what type of information the entity can hold. It has already been mentioned as
name , address etc. and these are the attributes of the entity. The attributes are the properties of
Page 6 Keld Marple, February 2010
Version 1.4
the entity. If an entity was a chest of drawers the attributes would be the drawers. All the attributes
together describes the entity. (At least technically - we might still need a description in a humanoid
language somewhere).
Finding the right attributes for the entities (and the right entities for the model) is a major part of the
database design process. We will get back to this issue later.
When you move from the model design level to the implementation level, attribute becomes
column or field .
Tuples
When the database is finished it needs to be populated e.g. filled with data. Whenever you enter a
new member into the member entity, you will have to enter one specific value that applies to this
particular member for each attribute in the entity. After you have done that, you have a collection of
information about one particular member. Or you could say: You have one instance of the entity.
This collection is called a tuple.
Other words for typle that you might encounter are row or record .
Values
When we get to actually inserting data into the database, we talk about values. A value is something
concrete in opposition to an attribute, which is an abstraction. Each new tuple in a table is to be
thought of as one specific instance of what the entity represents. A physical representation. If it is a
car-entity, one tuple would represent one physical car. This car is described in this particular tuple
and nowhere else. It might use stuff from other entities and refer to this using a foreign key. For
example a car could have airbags installed as extra equipment, but that equipment itself is described
elsewhere (in the equipment-entity) and is as such not part of the description of the car.
One value exists in one cell. Remember that each cell can contain one value no more. They are
said to be atomic, which means they cannot be split up into several values.
Data types
When storing data in a database it is important to know what you are actually storing. If you store a
name, you know by instinct, that it is not a number, and if you're storing a number, you know you
can't use letters. But what about phone numbers, dates or even binary files?
N u m b e r d a t a t y p e s
Numbers are special in that the computer uses the bits directly as numbers, and uses mathematics to
change it into decimal numbers. Numbers are quick to use one could say, that they are the native
language of the computer.
Integer
One of the most common types of data is the integer. That is whole numbers like 1, 2, 3, 4 and so
on. Integers can also have a negative value (unless you specify otherwise). The integer is usually a
32 bit number, which means that 32 bits are used to store that value. This gives you the ability to
store 232 (= 4 294 967 296) different values.
Page 7 Keld Marple, February 2010
Version 1.4
Float and double
If you need to store a number's decimal fractions, you cannot use the integer datatype. You'll have
to use one of the floating point types: Float which is 32 bit and double which is (surprisingly) 64
bit.
T e x t d a t a t y p e s ( s t r i n g s )
Using the binary system to represent text is fundamentally different from the representation of
numbers. While the base10 numbers can be calculated from their binary representation, letters and
signs have to be matched with some bit combination from a list. That list could be the ascii
character list in which the number 97 represents the letter a . To represent one letter 8 or 7 bits are
usually needed. With 8 bits you are able to represent 256 different values and therefore you could
have a list of 256 different characters.
Char
Char means characters . Chose the char datatype if your field contains a fixed length string. If you
for example want to store a phone number using the +XX XX XX XX XX format, you cannot use a
numeric datatype (as the + sign and the white spaces are not numbers), but you do know exactly
how many characters you'll need as every phone number has the same format (assuming you're
storing phone numbers from one country only). The number of characters needed in this example is
15: Ten numbers, one plus sign and four white spaces.
One could also imagine that you'd want to store an encrypted password. If you've used the MD5
encryption algorithm, you'll know for sure, that your encrypted result contains 32 characters. You'd
chose the char datatype, set to 32 characters.
Varchar
Another common type of text is the varchar datatype. The word varchar is an abbreviation of
variable character , meaning a string of characters of variable length. The varchar is special in that
the database will cut off unused characters. So if you define a field to be a varchar of 50 characters
and you only store the word ice cream in it, the database will only use as much space, as the
number of characters in the word plus a byte for the length of the field. Use that varchar field for
names, emails, adresses and any other type of field, where you dont know the length of the values
stored in it.
Text
The text datatype is another string-based datatype. It can hold very big amounts of characters. A
text field can hold 65,535 characters, but like the varchar it will cut of empty spaces, though not as
effectively as the varchar.
D a t e , t i m e , y e a r
These datatypes are actually string types as well. The difference lies in the ways they can be used.
Most database systems offers quite a few date and time handling functionalities.
Page 8 Keld Marple, February 2010
Version 1.4
N U L L
One special kind of data is the value of nothing. How can you represent nothing in registers that can
only contain ones or zeros? A memory register is never empty, so to represent emptiness you need a
special value. This value can only be used for one purpose: The representation of nothing. NULL is
that value.
Relationships
There are two fundamental organisational features of the relational database. One is, that you can
order entities in relations (tables) with their attributes. This one has been dealt with already. The
other main feature is, that you can create relationships between these entities. A relationship exists
between to entities.
Foreign keys
When a relationship is established between two tables it is done using special values called foreign
keys (FK). A foreign key is nothing but an extra attribute in one of the tables containing a value that
is actually identical to the primary key of one particular row in the other table. It is by comparing
these values that the relationship is realised. You can also think of the foreign key as a reference or
a link.
T h e o n e - t o - m a n y r e l a t i o n s h i p
There are three different types of relationships: The one-to-one relationship, which is the type of
relationship that exists between the entity and its attributes.
The second type of relationship is the one-to-many relationship. This type of relationship is actually
the one described above. It exists between to entities and you only need two tables to create it
physically.
Let's say we expand the car database commenced earlier. Instead of having the colours of the cars
written out directly in the car-table, we'll create a separate table for colours. This way the value
blue will only be represented one time in the entire database even if a thousand cars are blue.
The colour table could look like this:
COLOUR: (1)Blue
COLOUR: (2)Red
COLOUR: (3)White
COLOUR: (4)Black
COLOUR: (5)Green
COLOUR: (6)Yellow
So now, if you need to store a red car in the database, you could just refer to this table, stating that
the car has colour number 2. For all the cars in the table you'd change the content of the colour
attribute from the actual value ( blue , red etc.) to the number, that the colour has in the colour
table. All the colour attributes of the blue cars will contain the number 1 , because this is the value
of the blue colour. It is the primary key of blue.
The colour attribute of the car-entity has changed into a foreign key.
Page 9 Keld Marple, February 2010
Version 1.4
The car table from before, now looks like this:
MODEL: Opel Kadett MODEL: Ford Mondeo MODEL: Nissan Primera MODEL: Ford Mondeo
COLOUR: 2 COLOUR: 1 COLOUR: 1 COLOUR: 1
DOORS: 2 DOORS 5 DOORS: 4 DOORS 5
VOLUME: 1400 cm3 VOLUME: 1600 cm3 VOLUME: 1400 cm3 VOLUME: 1600 cm3
Note that the colours exists only as numbers references to the colour table: As foreign keys.
Let's examine this relationship further to find out what it means, that it is a one-to-many
relationship.
Take one car: The Nissan for example. This particular car can only have one colour: Blue. The Ford
Mondeo (both of them) can also (each of them) have one colour: Blue. There are three cars which
are blue, but each of the cars only has one colour. One could say that: One car can have only one
colour, but one colour can be applied to many cars. The one-to-many relationship.
Note that one of the tables in a one-to-many relationship contains a foreign key. The other doesn't.
The colour table doesn't have any keys that points to the car-table, and as a result one cannot say
anything about cars by looking at the colour table. You can't even figure out how many cars are
blue, because it is not part of the description of a colour to know how many cars exist in that colour.
But it is part of the description of the car to say which colour it is. Therefore the foreign key exists
in the car table.
T h e m a n y - t o - m a n y r e l a t i o n s h i p
The many-to-many relationship is a bit more complicated than the one-to-many. It turns out, that it
is impossible to create a many-to-many relationship using only two tables. Even though we might
insist, that a many-to-many relationship exists between two entities, we cannot implement it
physically with just two tables. An extra table has to be created.
The example here will be the relationship between a car and extra equipment. Here is a list of
equipment:
1 Car radio dkr. 699,-
2 Battery charger dkr. 469,-
3 GPS-system dkr. 2499,-
4 Spoilers dkr. 789,-
The people who make car radios want to sell as many radios as possible. Therefore they have made
the radio to fit into most cars. On the other hand: Most cars can actually have more than one piece
of extra equipment. You can install a radio in a car, even though you put on the spoilers only last
week. The relationship between cars and extra equipment described is the many-to-many
relationship.
The question is now how to establish this relationship. Neither in the car table or the extra
equipment table is there anything that indicates a connection between the two. This is not a
coincidence.
To make the relationship an extra table must be created. This new tables has one purpose of
existence: To hold two foreign keys one for each table.
Let's try it to see how it works:
Page 10 Keld Marple, February 2010
Version 1.4
The car_extra_equipment_table
Car Equipment
Ford Mondeo (1) Car radio
Ford Mondeo (1) GPS-system
Nissan Primera Car radio
Nissan Primera Spiolers
Ford Mondeo (2) Car radio
Opel Kadett Car radio
Open Kadett Battery Charger
Open Kadett Spoilers
As you can see any car can be represented more than once in this table. The Opel Kadett for
instance is represented three times. On the other hand, any piece of equipment can also be
represented more than once. All the cars have a car radio, so that piece is represented four times. So
any car and any equipment can exist as many times as is needed what happened to uniqueness of a
row? Actually it exist in the combination of the two columns. The Ford Mondeo (1) only exist one
time combined with the car radio. The car itself is represented two times, but with two different
pieces of equipment.
The car radio exists four times, but in combination with four different cars.
With this table the many-to-many relationship comes to life: One car can have many pieces of
equipment (and it will appear in the table exactly that many times), and any particular piece of
equipment can be applied to many cars (and will appear in the table exactly as many times, as it is
applied).
This new table that is created only consists of two foreign keys. It is only there to make the many-
to-many relationship between two other tables possible. To indicate, that it doesn't contain any
attributes that are not keys, it is called a weak entity.
The primary key of the weak entity could very well be the two foreign keys combined. This is
called a composite primary key and is very common.
When this table is implemented it will contain only numbers. Not words like the example above.
The physical table in the database would look like this:
car_equipment
car equipment
2 1
2 3
3 1
3 4
4 1
1 1
1 2
1 4
Page 11 Keld Marple, February 2010
Version 1.4
Design issues
Any database that's created, is created to serve a purpose. When you decide, that your project needs
a database, it it because you have figured out that either the project cannot be done without a
database or maybe that if you add the database to the project, you can make it much more efficient.
Whatever the reason, you want your database to be reliable. You want to take advantage of the
guaranties given by the DBMS, that if you follow the rules of database design for relational
databases, the DBMS will guaranty that your data wont get messed up.
Before you go on implementing your database, you must go through the database design phase.
That is the phase where you figure out what you need to solve your problem.
The data model
The first thing you must do is to make a model of your system. That is a model of what you need to
be able to store. In the car database used as the example of this paper, you would have to be able to
store cars and extra equipment. You know this because you have tried to describe the system to
yourself (and preferably others) in normal words. Something like:
The application is a website, where the users can search for different used cars and what extra equipment fits the car. The user will
also be able to sign up for a newsmail, whenever a new (used) car comes into stock.
Reading this paragraph carefully you find several important words: users, cars, extra equipment,
newsmail and stock. These words are interesting because they are words closely connected to the
requirements of the system. If the system cannot do, what's in the description it is not the right
system (but some other system), and the system development has failed.
Any system is a little world in itself. It cannot deal with anything that it is not created to deal with,
but it must be able to handle the things for which it has been created. Reading the description this
system must be made to handle users, cars, extra equipment, newsmails and the stock. The rule of
thumb says that the nouns of a description makes good candidates for entities. It might not be all of
the nouns that actually end up as tables, and new ones might show up in the process, but we need a
place to start, and this is a good one.
We start therefore with the following entities:
user
car
extra equipment
newsmail
stock
Page 12 Keld Marple, February 2010
Version 1.4
E/R-diagram (logical)
With the entities found it is possible to make the E/R-diagram. E/R stand for entity/relationship ,
so this is a diagram that shows us which entities we have in the system, and what relationships exist
between them.
We have already found one of them: The many-to-many relationship between the car-entity and the
extra equipment-entity.
When doing the E/R-diagram the entities are usually represented as boxed, and the relationships are
represented with lines between the boxes. The lines are drawn with a diamond shaped box on top of
it.
Like this:
car equipment
C a r d i n a l i t y
To make this diagram really useful one extra information is included in the diagram: The cardinality
of the relationship. Between the car-entity and the equipment-entity a many-to-many relationship
exist. So at each end of the relationship-line a number is written to signify many . We don't have
any restriction on that number, so we just call it many. It could be one, two, fifty or even none at all.
The symbol for many in database modelling is the letter n (lower case).
Now we need a many symbol at each end of the relationship, but to prevent anyone from
misunderstanding our model and assume that the two n's actually means the same number, we use
the letter 'm' for the second part of the many-to-many relationship.
This is how it looks with cardinalities:
n m
car equipment
That part of the E/R-diagram is now finished. The only thing missing is the rest of the entities.
Here is a suggestion of the five entities:
stock
1
n
n m
car equipment
n m
user newsmail
As you might have noticed, not all entities are currently connected. There is no rule saying that all
tables must be connected. In fact in this model, there is one entity, which might end up totally
Page 13 Keld Marple, February 2010
Version 1.4
isolated. Take the newsmail entity. When you think about it, no natural relationship between a
user and the news exists. The newsmail entity may contain the actual newsmails that the company
sends out, and the user might have an attribute stating whether they are interested in the news or
not. But no relationship. When new news are written, they are stored in the newsmail table and the
system will run through all the users to see who want a newsmail. It will then send the mail. The
relationship between user and newsmail must be removed.
On the other hand. If you're thinking ahead, it could very well be of some interest to the car
salesman to be able to find out who bought which car six months ago. So maybe there should have
been a relationship between car and user.
The E/R-diagram changes into this:
stock
1
n
m
n
car equipment
n
newsmail
m
user
Notice that it is often difficult to create the perfect E/R diagram in one attempt. But after you have
made a diagram, you should go through it while thinking about all relations: Why are they there?
What does it mean, that there's a relationship between these two entities. Sometimes it becomes
clear, that the relationship should be removed or that the cardinality should be changed. And
sometimes you'd find, that you have missed an important relationship between two entities.
This work is on the logical level. We are at this time thinking about what our model represents.
There's a logical relationship between car and user: A car can be bought by many users and a user
can buy many cars. As you'll see shortly, this is not a precise physical representation of the
database. For this reason, we call this level of the E/R diagram the Logical E/R diagram.
Page 14 Keld Marple, February 2010
Version 1.4
You can add attributes to the E/R-diagram to get an idea of what your entities contains.
model
colour
stock
doors
1 volume
type
price
n
delivery
m
n
car equipment
n
newsmail
m
user
name
email
password
want_news
Doing this might reveal attributes, that are actually entities. Like the colour-attribute. This would be
much better as a separate entity. The final E/R-diagram then looks like this:
stock
1
n
m
n
colour car equipment
n
1
n
newsmail
m
user
Page 15 Keld Marple, February 2010
Version 1.4
Physical E/R diagram
After finishing the logical E/R-diagram, it is time to move on to the physical E/R diagram. The
physical E/R diagram is meant to be an exact resemblence of the database. Seven boxes in the
diagram, means seven tables in the database. This is different from the logical E/R-diagram.
The physical E/R diagram for the used cars company would look like this:
stock
colour
car car_equip equipment
car_user
newsmail
user
The real big difference between the logical and the physical E/R-diagram is that you can't have
many-to-many relationships in the physical E/R diagram. All many-to-many diagrams will have to
be split up into two one-to-many relationships, because this is how a many-to-many relationship is
actually (physically) build. So in the physical E/R diagram you'd add all the weak entities so that
the diagram shows the exact amount of tables in the database.
Normalisation
It is time to get hold of the attributes of the entities. Entities can be written as a list like this:
car colour stock user equipment car_user
car_id colour_id stock_id user_id equip_id car_id
model colour street name type user_id
colour number street price buying_date
doors house no. delivery year_of_car
volume postal no.
district
To make sure a database is correctly constructed, a few rules have been developed which can help
clean up inconsistent design. They are called the rules of normalisation and there are six of them.
We only concern ourselves with the first three rules, as they are the ones with practical significance.
The rules are called forms and we will now go through normal form 1, 2 and 3.
Page 16 Keld Marple, February 2010
Version 1.4
1 . n o r m a l f o r m
For a relational database to be in 1. normal form one must assure that all attributes are dependant on
the primary key. That means that you can't have an attribute where the content does not describe the
instance represented in the tuple, but actually could be described better on its own. The colour
attribute in the car-entity is an example. Before it was separated into its own table, the colour was
written directly in the car-entity, but we don't want to have to go to the car-entity to learn about a
colour. We want to be able to find everything regarding the colour (like it's name) in a colour entity.
Imagine that you had to find a certain car to find out what orange was like. Better to go to the tuple
"orange" in the colour table.
The 1. normal form also says, that the entities must be free of repeating fields. Repeating fields
means numerous columns describing the same type of information. Imagine that the car salesman
wanted to store a list of former owners of his car. One could then be tempted to include 3 or 4 extra
columns in the car-table called owner 1 , owner 2 , owner 3 and owner 4 . But this is not
good. If a car were to have more than 4 owners, there would not be enough place in the database.
That's messy design, and it has severe structural limitations. One obvious limitation is, that a car can
have a maximum of four owners.
It is much better to have a table called user (as in the model above) and then construct a relationship
between the two.
2 . n o r m a l f o r m
The second normal form says that if you have tables with composite primary keys, then all other
attributes must be equally dependant on both of them. A violation of the 2. normal form can be seen
in the car_user table above. The attribute called year_of_car has been wrongly placed in this
entity for several reasons. One of them is, that the attribute was thought of while creating the entity,
and it seemed perfectly sound to include an attribute about the age of the car when it is bought. But
this is not the right place to store the value. If the car is bought back by the used car company, and
then sold again, the value will be stored again. And even worse: The age of the car can only be
stored when a car is sold. This is what the 2. normal form is meant to prevent. The year-of-car
attribute is more dependant on the car_id foreign key than the user_id foreign key, for which it
is totally irrelevant. As the two foreign keys together form a composite primary key, the
year_of_car attribute can not be said to be equally dependant on both part of the primary key. The
attribute must be moved to the entity where it belong: The car entity.
3 . n o r m a l f o r m
The 3. normal form says that no attribute must be more dependant on another attribute than the
primary key. There is an example of that in the tables above as well. In the user-table the attribute
called district is part of the users address and for that reason one might think that it should be
placed inside the user-table like the rest of the address. But this is a mistake. Because the district is
actually not directly bound to the user as much as it is bound to the postal number. There is never a
case where the same postal number is combined with two different districts. The district follow the
number in all cases. If a user moves the address changes. The district also changes, but it does so
mainly because the postal code is changed.
For this reason the postal number and the district make up their own table, where the postal number
is the primary key and the district is the only attribute. That way potential errors are avoided.
Page 17 Keld Marple, February 2010
Version 1.4
Redundancy
Generally speaking the normalisation phase is about avoiding errors. The worst error in a database
is repeating data. The technical term for this is redundancy. The reason why this has to be avoided,
is that it is the biggest source of serious errors. If data which means the same (like a postal district)
is repeated many times throughout the database, it can be difficult to guaranty consistence. If for
some reason the district is spelled wrongly in one instance, that error will be very hard to find.
Data dictionary
When the database model is finished, you can move on to the data dictionary. The data dictionary is
an important document for the programmer who's implementing the database. It is also an
invaluable document when it comes to maintaining the database. If you for some reason want to
change the database months after the launch of the application, you will need a document that tells
you exactly what's what. Don't assume that you'll be able to remember anything in half a year. You
wont. Databases quickly get very big, and they are structured in a way not necessarily suited for the
human mind. Documentation is the answer, and the data dictionary is the most detailed document
when it comes to attributes.
Here is an example of a data dictionary for one of the entities in the used car data model:
car Datatype Range Default NULL unique indexed Foreign Auto-
value Key (FK) increment
car_id int No Yes
model varchar 30 No
colour int No To colour
doors tinyint 4 No
volume int No
year year Yes
A documentation like this tells the technician that implements the database how to program the
database. You should not expect the technicians to figure out for themselves the datatypes, range,
default values etc. for the attributes. Not because technicians are less intelligent, but because
nobody can maintain a clear picture of the whole system while typing at the keyboard. How should
a technician know if you've decided your varchars to be 30 or 40 characters in length. How would
you know, if you took over from somebody else?
Page 18 Keld Marple, February 2010

Wyszukiwarka