A brief Introduction to Database Theory ver1 4

background image

Version 1.4

A brief introduction

to

database theory

Written by Keld Marple

Page 1

Keld Marple, February 2010

background image

Version 1.4

Table of Contents

Introduction..........................................................................................................................................3

A database is a program...................................................................................................................3
A database is a concept....................................................................................................................3
Data..................................................................................................................................................4
What do you use them for?..............................................................................................................4

The relational database.........................................................................................................................5
Uniqueness...........................................................................................................................................5

The primary key..........................................................................................................................5
On choosing a good primary key................................................................................................6

What's in a database?............................................................................................................................6

Entity................................................................................................................................................6
Attribute...........................................................................................................................................6
Tuples...............................................................................................................................................7
Values..............................................................................................................................................7
Data types........................................................................................................................................7

Number datatypes.......................................................................................................................7

Integer.....................................................................................................................................7
Float and double.....................................................................................................................8

Text datatypes (strings)...............................................................................................................8

Char........................................................................................................................................8
Varchar...................................................................................................................................8
Text.........................................................................................................................................8

Date, time, year...........................................................................................................................8
NULL..........................................................................................................................................9

Relationships........................................................................................................................................9

Foreign keys.....................................................................................................................................9

The one-to-many relationship.....................................................................................................9
The many-to-many relationship................................................................................................10

Design issues......................................................................................................................................12

The data model ..............................................................................................................................12
E/R-diagram (logical)....................................................................................................................13

Cardinality.................................................................................................................................13

Physical E/R diagram.....................................................................................................................16
Normalisation.................................................................................................................................16

1. normal form...........................................................................................................................17
2. normal form...........................................................................................................................17
3. normal form...........................................................................................................................17

Redundancy...................................................................................................................................18
Data dictionary...............................................................................................................................18

Page 2

Keld Marple, February 2010

background image

Version 1.4

Introduction

A digital database is a system in which data can be stored and retrieved.
In most cases where you want a system to remember something you will need a database. If you
create a webpage with a member registration, you will need to store some information about the
registered members. These information must be stored somewhere and the best solution by far is a
database.
There are different database systems (DBMS) available. One of the biggest systems is Oracle,
which is an expensive high-end DBMS. Another one is MySQL, which is an open source DBMS
and free of charge for the user. MySQL is very popular as the DBMS used by dynamic web pages.
It is a very efficient database and it works well with the scripting language php.
When a system which needs a database has to be created, it is crucial that the database model is well
designed. The database model is the layout of the tables, and therefore all the tables should be
included in the model. A model is a small part of the real world. The model describes that part
sufficiently. The model describes nothing but the things relevant to one particular system: The
system you're currently designing. If a system needs to store some information about a member, the
table could have a username, an email and a password. In reality these three information surely does
not descibe a person satisfactory, but in this particular system it suffices.
The database model must be well structured for several reasons. The DBMS guarantees one thing: It
will store your data and find them again - IF you follow the specified rules for database design.
Another good reason is to avoid redundancy (= dual representation of data).

A database is a program

Just like anything else on a computer the database is software: A program that you install on your
hard disk and that has to be running for you to use it. I doesn't matter if the database program has
been installed on your own computer or another computer located far away as long as you have the
appropriate network software to connect to the database.

A database is a concept

The idea of storing data is not a new one. Neither is the idea of a database. People have figured
ways of storing items, information and even knowledge in many different ways throughout the
human history. But storing information isn't enough. Not if there is a lot of it. If you need only to
handle the names of your kids or your siblings, it is (for most people) sufficient to use the memory
conveniently located in the brain, but when it comes to all the people in a country (and their phone
numbers), we need some storing device, that we can use to find the information we want. Telephone
books are used for that. But the printed telephone book has one big disadvantage: You cannot list
all the entries alphabetically and by number at the same time. So the phone book is good if you
know the name and want to find the phone number, but it is utterly useless if you want to find a
name from a number. To solve this problem we could print another version of the same data, this
time listed in numerical order. But what if you want to search for people with certain professions,
people from the same street or the same part of town? And what if all you know is the first name of
that person?
This is where the digital database comes to the rescue. When using a digital database you are able

Page 3

Keld Marple, February 2010

background image

Version 1.4

to store your data in such a way that every piece of data is written once, but can be used in many
different ways.
This paper is only concerning the digital database.

Data

As suggested above, many things can be considered data. Any information you would like to store
for later use can be data. If you think, that you might one day want to be able to make a log of your
customers' whereabouts, you would have to store their actions somehow – you would need to store
some data.
If you sell used cars, and you want to keep track of your business, you could create a database and
type in everything you want to store about your cars and your business. These information are data.
Knowledge can also be thought of as data, but it is harder to define what knowledge is. Often
people prefer to think of knowledge as the thing you achive from combining information. In that
respect, we only store information in our digital database. It is when we use it, we change it into
knowledge.

What do you use them for?

Now for the big question: Is it really necessary if you are creating a small website? To be honest:
Yes it is. If you're doing anything but an online business card, you will almost certainly enhance the
flexibility and efficiency of the site and you will cut down the hours spend on updating.
Here's a few examples:

Website

Without database

With database

An internet page about your hobby –
could be stamps.

You could have a site showing
pictures of all your stamps put into a
category like nationality. Whenever
you get a new stamp you'll upload the
picture and edit the html-page for that
country to include the new image.

The same as without, but now you
could include different or even
multiple categories (like year printed,
condition, motive). When you acquire
a new stamp, you upload the picture
and include it in your database. It
instantly works on all pages.

A weblog.

You write your thoughts every day in
a text editor of your choice. Then you
upload that file using a predefined
file-name. Remember to either delete
or rename any old files. The website
shows the newest one (e.i. The one
with the right filename). People can
browse your daily comments – if you
provide link to older pages.

You write your daily comments
directly into a web form. People can
both browse and search your texts.
You can include search criteria like
headlines, subjects etc. You can also
see how many times your article have
been read.

A small business website: A used cars
salesman.

You have big lumps of information on
the website including pictures of cars.
New cars come in daily, and you
update by uploading pages and images
in new pages using templates.

You can include user comments and
auctions. And naturally uploading and
maintaining the horde of information
gets much easier

Page 4

Keld Marple, February 2010

background image

Version 1.4

The relational database

The most common type of database is the relational database. The database is called relational
because data is ordered in relations (that is: tables). Each table is meant to describe one thing or
category, like a car for instance. Everything you need to know in relation to a car will be part of the
car-table. So the model, car-colour, number-of-doors, motor-volume etc. are all properties related to
a car. Even though different cars can have different colours, each car will have one colour.
Put in a schematic way, you could write it like this:

MODEL:

Opel Kadett

MODEL:

Ford Mondeo

MODEL:

Nissan Primera

COLOUR:

Red

COLOUR:

Blue

COLOUR:

Blue

DOORS:

2

DOORS

5

DOORS:

4

VOLUME:

1400 cm3

VOLUME:

1600 cm3

VOLUME:

1400 cm3

As you see, we have 3 cars registered in this list. They are different in model and they all have the
properties listed. The actual value of the property – it could be the value “blue” of the property
“colour” does not make sense on its own. None of the properties do. To use the information
provided it is necessary to relate the values with the actual car. The relation between the actual car
and the properties is what have given name to the relational database.

NOTE: You might have heard of relations between tables and then you might think that it is these
relations that gave name to the relational database. This is a common mistake. To be precise, we
call this type of relations for “relationships.”
We will discuss the relationships between tables later on.

Uniqueness

Now let's say we add one car to the garage. A blue Ford Mondeo with 5 doors and a 1600 cm3
motor. Then the list looks like this:

MODEL: Opel Kadett

MODEL:

Ford Mondeo

MODEL:

Nissan Primera

MODEL:

Ford Mondeo

COLOUR: Red

COLOUR:

Blue

COLOUR:

Blue

COLOUR:

Blue

DOORS: 2

DOORS

5

DOORS:

4

DOORS

5

VOLUME: 1400 cm3

VOLUME:

1600 cm3

VOLUME:

1400 cm3

VOLUME:

1600 cm3

As you might have noticed, there are two identical cars in the list. It is no strange matter in the
garage, but it's bad news in a database. One of the things to achieve when using databases is the
uniqueness of every single item present.

T h e p r i m a r y k e y
This problem leads us to a very important concept: The primary key. The primary key is one
information or attribute, that must be guaranteed unique. In this way every new entry will be unique
- even if several of the values are similar to those of other entries. Whenever you are searching for a
specific row, you are sure that you can always address it precisely using it's primary key.

Page 5

Keld Marple, February 2010

background image

Version 1.4

O n c h o o s i n g a g o o d p r i m a r y k e y
There are at least three different approaches when choosing a primary key.
The first and perhaps most obvious way, is to look through the attributes and find one which is
some sort of unique value already. The personal registration number is such a value. No two
persons can have the same personal registration number. The ISBN for books is also a unique value,
that only applies to one title. So unless you are doing a registration of specific volumes, the ISBN
will be a suitable primary key.
Sometimes you don't have such a unique attribute at hand. A movie database can suffer from this, if
all you have is a title, a plot description, production year and so on. In such cases an identifying
attribute is created. The only purpose of this attribute is to act as primary key. One can even ask the
database system to take care of the numbering of the primary key. Each time someone adds a row to
the database, the database system will make sure that the next number in line will be added as the
primary key of this row.
The third way one can create a primary key is to choose two or more attributes that together create a
guaranteed unique combination.
When an attribute is defined as a primary key, the database will make sure, that nobody insert more
rows with the same primary key value.

What's in a database?

To make sure that everybody knows what everybody's talking about, certain words have come into
play during the years. To be completely honest, I might add, that not everybody uses all the words
in exactly the same way, and they don't always use them consistently and unambiguously. But one
has to start somewhere – when you know your way around the database theory, you will not get
confused because someone uses a different word. Here is some of the important words and their
meaning.

Entity

The first thing we create, when we develop a database model is the entities. Entities are things or
ideas from the physical world, that we need in our model. A database with members could choose
“person” as an entity. Or even “member”. The entity is meant as an abstraction of what it
represents. Is is – naturally – not the thing itself. It is only a representation of the thing. And it is
only meant to represent the thing or idea in one particular model. Therefore we might include things
like “name”, “address” and “telephone number” in our entity called “member”, even though another
database designer doing another database might include “login”, “password” and “email” in his
member-entity.
Often you will see the words “table” and “relation” as more or less synonymous with the word
entity. Developers use the word entity when designing the model and the word table when referring
to the actual tables that have been created in the database.

Attribute

When we have found an entity for our database model, we must describe it in detail. This is done by
specifying exactly what type of information the entity can hold. It has already been mentioned as
“name”, “address” etc. and these are the attributes of the entity. The attributes are the properties of

Page 6

Keld Marple, February 2010

background image

Version 1.4

the entity. If an entity was a chest of drawers the attributes would be the drawers. All the attributes
together describes the entity. (At least technically - we might still need a description in a humanoid
language somewhere).
Finding the right attributes for the entities (and the right entities for the model) is a major part of the
database design process. We will get back to this issue later.
When you move from the model design level to the implementation level, attribute becomes
“column” or “field”.

Tuples

When the database is finished it needs to be populated – e.g. filled with data. Whenever you enter a
new member into the member entity, you will have to enter one specific value that applies to this
particular member for each attribute in the entity. After you have done that, you have a collection of
information about one particular member. Or you could say: You have one instance of the entity.
This collection is called a tuple.
Other words for typle that you might encounter are “row” or “record”.

Values

When we get to actually inserting data into the database, we talk about values. A value is something
concrete in opposition to an attribute, which is an abstraction. Each new tuple in a table is to be
thought of as one specific instance of what the entity represents. A physical representation. If it is a
car-entity, one tuple would represent one physical car. This car is described in this particular tuple –
and nowhere else. It might use stuff from other entities and refer to this using a foreign key. For
example a car could have airbags installed as extra equipment, but that equipment itself is described
elsewhere (in the equipment-entity) and is as such not part of the description of the car.
One value exists in one cell. Remember that each cell can contain one value – no more. They are
said to be atomic, which means they cannot be split up into several values.

Data types

When storing data in a database it is important to know what you are actually storing. If you store a
name, you know by instinct, that it is not a number, and if you're storing a number, you know you
can't use letters. But what about phone numbers, dates or even binary files?

N u m b e r d a t a t y p e s
Numbers are special in that the computer uses the bits directly as numbers, and uses mathematics to
change it into decimal numbers. Numbers are quick to use – one could say, that they are the native
language of the computer.

Integer

One of the most common types of data is the integer. That is whole numbers like 1, 2, 3, 4 and so
on. Integers can also have a negative value (unless you specify otherwise). The integer is usually a
32 bit number, which means that 32 bits are used to store that value. This gives you the ability to
store 2

32

(= 4 294 967 296) different values.

Page 7

Keld Marple, February 2010

background image

Version 1.4

Float and double

If you need to store a number's decimal fractions, you cannot use the integer datatype. You'll have
to use one of the floating point types: Float which is 32 bit and double which is (surprisingly) 64
bit.

T e x t d a t a t y p e s ( s t r i n g s )
Using the binary system to represent text is fundamentally different from the representation of
numbers. While the base10 numbers can be calculated from their binary representation, letters and
signs have to be matched with some bit combination from a list. That list could be the ascii
character list in which the number 97 represents the letter ”a”. To represent one letter 8 or 7 bits are
usually needed. With 8 bits you are able to represent 256 different values and therefore you could
have a list of 256 different characters.

Char

Char means “characters”. Chose the char datatype if your field contains a fixed length string. If you
for example want to store a phone number using the +XX XX XX XX XX format, you cannot use a
numeric datatype (as the + sign and the white spaces are not numbers), but you do know exactly
how many characters you'll need as every phone number has the same format (assuming you're
storing phone numbers from one country only). The number of characters needed in this example is
15: Ten numbers, one plus sign and four white spaces.
One could also imagine that you'd want to store an encrypted password. If you've used the MD5
encryption algorithm, you'll know for sure, that your encrypted result contains 32 characters. You'd
chose the char datatype, set to 32 characters.

Varchar

Another common type of text is the varchar datatype. The word ”varchar” is an abbreviation of
”variable character”, meaning a string of characters of variable length. The varchar is special in that
the database will cut off unused characters. So if you define a field to be a varchar of 50 characters
and you only store the word ”ice cream” in it, the database will only use as much space, as the
number of characters in the word plus a byte for the length of the field. Use that varchar field for
names, emails, adresses and any other type of field, where you dont know the length of the values
stored in it.

Text

The text datatype is another string-based datatype. It can hold very big amounts of characters. A
text field can hold 65,535 characters, but like the varchar it will cut of empty spaces, though not as
effectively as the varchar.

D a t e , t i m e , y e a r
These datatypes are actually string types as well. The difference lies in the ways they can be used.
Most database systems offers quite a few date and time handling functionalities.

Page 8

Keld Marple, February 2010

background image

Version 1.4

N U L L
One special kind of data is the value of nothing. How can you represent nothing in registers that can
only contain ones or zeros? A memory register is never empty, so to represent emptiness you need a
special value. This value can only be used for one purpose: The representation of nothing. NULL is
that value.

Relationships

There are two fundamental organisational features of the relational database. One is, that you can
order entities in relations (tables) with their attributes. This one has been dealt with already. The
other main feature is, that you can create relationships between these entities. A relationship exists
between to entities.

Foreign keys

When a relationship is established between two tables it is done using special values called foreign
keys (FK). A foreign key is nothing but an extra attribute in one of the tables containing a value that
is actually identical to the primary key of one particular row in the other table. It is by comparing
these values that the relationship is realised. You can also think of the foreign key as a reference or
a link.

T h e o n e - t o - m a n y r e l a t i o n s h i p
There are three different types of relationships: The one-to-one relationship, which is the type of
relationship that exists between the entity and its attributes.
The second type of relationship is the one-to-many relationship. This type of relationship is actually
the one described above. It exists between to entities and you only need two tables to create it
physically.
Let's say we expand the car database commenced earlier. Instead of having the colours of the cars
written out directly in the car-table, we'll create a separate table for colours. This way the value
“blue” will only be represented one time in the entire database even if a thousand cars are blue.
The colour table could look like this:

COLOUR: (1) Blue
COLOUR: (2) Red

COLOUR: (3) White
COLOUR: (4) Black

COLOUR: (5) Green
COLOUR: (6) Yellow

So now, if you need to store a red car in the database, you could just refer to this table, stating that
the car has colour number 2. For all the cars in the table you'd change the content of the colour
attribute from the actual value (“blue”, “red” etc.) to the number, that the colour has in the colour
table. All the colour attributes of the blue cars will contain the number “1”, because this is the value
of the blue colour. It is the primary key of blue.
The colour attribute of the car-entity has changed into a foreign key.

Page 9

Keld Marple, February 2010

background image

Version 1.4

The car table from before, now looks like this:

MODEL: Opel Kadett

MODEL:

Ford Mondeo

MODEL:

Nissan Primera

MODEL:

Ford Mondeo

COLOUR: 2

COLOUR:

1

COLOUR:

1

COLOUR:

1

DOORS: 2

DOORS

5

DOORS:

4

DOORS

5

VOLUME: 1400 cm3

VOLUME:

1600 cm3

VOLUME:

1400 cm3

VOLUME:

1600 cm3

Note that the colours exists only as numbers – references to the colour table: As foreign keys.
Let's examine this relationship further to find out what it means, that it is a one-to-many
relationship.
Take one car: The Nissan for example. This particular car can only have one colour: Blue. The Ford
Mondeo (both of them) can also (each of them) have one colour: Blue. There are three cars which
are blue, but each of the cars only has one colour. One could say that: “One car can have only one
colour, but one colour can be applied to many cars.” The one-to-many relationship.
Note that one of the tables in a one-to-many relationship contains a foreign key. The other doesn't.
The colour table doesn't have any keys that points to the car-table, and as a result one cannot say
anything about cars by looking at the colour table. You can't even figure out how many cars are
blue, because it is not part of the description of a colour to know how many cars exist in that colour.
But it is part of the description of the car to say which colour it is. Therefore the foreign key exists
in the car table.

T h e m a n y - t o - m a n y r e l a t i o n s h i p
The many-to-many relationship is a bit more complicated than the one-to-many. It turns out, that it
is impossible to create a many-to-many relationship using only two tables. Even though we might
insist, that a many-to-many relationship exists between two entities, we cannot implement it
physically with just two tables. An extra table has to be created.
The example here will be the relationship between a car and extra equipment. Here is a list of
equipment:

1

Car radio

dkr. 699,-

2

Battery charger dkr. 469,-

3

GPS-system

dkr. 2499,-

4

Spoilers

dkr. 789,-

The people who make car radios want to sell as many radios as possible. Therefore they have made
the radio to fit into most cars. On the other hand: Most cars can actually have more than one piece
of extra equipment. You can install a radio in a car, even though you put on the spoilers only last
week. The relationship between cars and extra equipment described is the many-to-many
relationship.
The question is now how to establish this relationship. Neither in the car table or the extra
equipment table is there anything that indicates a connection between the two. This is not a
coincidence.
To make the relationship an extra table must be created. This new tables has one purpose of
existence: To hold two foreign keys – one for each table.
Let's try it to see how it works:

Page 10

Keld Marple, February 2010

background image

Version 1.4

The car_extra_equipment_table

Car

Equipment

Ford Mondeo (1)

Car radio

Ford Mondeo (1)

GPS-system

Nissan Primera

Car radio

Nissan Primera

Spiolers

Ford Mondeo (2)

Car radio

Opel Kadett

Car radio

Open Kadett

Battery Charger

Open Kadett

Spoilers

As you can see any car can be represented more than once in this table. The Opel Kadett for
instance is represented three times. On the other hand, any piece of equipment can also be
represented more than once. All the cars have a car radio, so that piece is represented four times. So
any car and any equipment can exist as many times as is needed – what happened to uniqueness of a
row? Actually it exist in the combination of the two columns. The Ford Mondeo (1) only exist one
time combined with the car radio. The car itself is represented two times, but with two different
pieces of equipment.
The car radio exists four times, but in combination with four different cars.
With this table the many-to-many relationship comes to life: One car can have many pieces of
equipment (and it will appear in the table exactly that many times), and any particular piece of
equipment can be applied to many cars (and will appear in the table exactly as many times, as it is
applied).
This new table that is created only consists of two foreign keys. It is only there to make the many-
to-many relationship between two other tables possible. To indicate, that it doesn't contain any
attributes that are not keys, it is called a weak entity.
The primary key of the weak entity could very well be the two foreign keys combined. This is
called a composite primary key and is very common.
When this table is implemented it will contain only numbers. Not words like the example above.
The physical table in the database would look like this:

car_equipment

car

equipment

2

1

2

3

3

1

3

4

4

1

1

1

1

2

1

4

Page 11

Keld Marple, February 2010

background image

Version 1.4

Design issues

Any database that's created, is created to serve a purpose. When you decide, that your project needs
a database, it it because you have figured out that either the project cannot be done without a
database or maybe that if you add the database to the project, you can make it much more efficient.
Whatever the reason, you want your database to be reliable. You want to take advantage of the
guaranties given by the DBMS, that if you follow the rules of database design for relational
databases, the DBMS will guaranty that your data wont get messed up.
Before you go on implementing your database, you must go through the database design phase.
That is the phase where you figure out what you need to solve your problem.

The data model

The first thing you must do is to make a model of your system. That is a model of what you need to
be able to store. In the car database used as the example of this paper, you would have to be able to
store cars and extra equipment. You know this because you have tried to describe the system to
yourself (and preferably others) in normal words. Something like:

“The application is a website, where the users can search for different used cars and what extra equipment fits the car. The user will
also be able to sign up for a newsmail, whenever a new (used) car comes into stock.”

Reading this paragraph carefully you find several important words: users, cars, extra equipment,
newsmail and stock. These words are interesting because they are words closely connected to the
requirements of the system. If the system cannot do, what's in the description – it is not the right
system (but some other system), and the system development has failed.
Any system is a little world in itself. It cannot deal with anything that it is not created to deal with,
but it must be able to handle the things for which it has been created. Reading the description this
system must be made to handle users, cars, extra equipment, newsmails and the stock. The rule of
thumb says that the nouns of a description makes good candidates for entities. It might not be all of
the nouns that actually end up as tables, and new ones might show up in the process, but we need a
place to start, and this is a good one.
We start therefore with the following entities:

user

car

extra equipment

newsmail

stock

Page 12

Keld Marple, February 2010

background image

Version 1.4

E/R-diagram (logical)

With the entities found it is possible to make the E/R-diagram. E/R stand for “entity/relationship”,
so this is a diagram that shows us which entities we have in the system, and what relationships exist
between them.
We have already found one of them: The many-to-many relationship between the car-entity and the
extra equipment-entity.
When doing the E/R-diagram the entities are usually represented as boxed, and the relationships are
represented with lines between the boxes. The lines are drawn with a diamond shaped box on top of
it.
Like this:

C a r d i n a l i t y
To make this diagram really useful one extra information is included in the diagram: The cardinality
of the relationship. Between the car-entity and the equipment-entity a many-to-many relationship
exist. So at each end of the relationship-line a number is written to signify ”many”. We don't have
any restriction on that number, so we just call it many. It could be one, two, fifty or even none at all.
The symbol for many in database modelling is the letter ”n” (lower case).
Now we need a many symbol at each end of the relationship, but to prevent anyone from
misunderstanding our model and assume that the two n's actually means the same number, we use
the letter 'm' for the second part of the many-to-many relationship.
This is how it looks with cardinalities:

That part of the E/R-diagram is now finished. The only thing missing is the rest of the entities.
Here is a suggestion of the five entities:

As you might have noticed, not all entities are currently connected. There is no rule saying that all
tables must be connected. In fact in this model, there is one entity, which might end up totally

Page 13

Keld Marple, February 2010

car

equipment

car

equipment

n

m

car

equipment

n

m

stock

user

newsmail

n

1

n

m

background image

Version 1.4

isolated. Take the “newsmail” entity. When you think about it, no natural relationship between a
user and the news exists. The “newsmail” entity may contain the actual newsmails that the company
sends out, and the user might have an attribute stating whether they are interested in the news or
not. But no relationship. When new news are written, they are stored in the newsmail table and the
system will run through all the users to see who want a newsmail. It will then send the mail. The
relationship between user and newsmail must be removed.
On the other hand. If you're thinking ahead, it could very well be of some interest to the car
salesman to be able to find out who bought which car six months ago. So maybe there should have
been a relationship between car and user.
The E/R-diagram changes into this:

Notice that it is often difficult to create the perfect E/R diagram in one attempt. But after you have
made a diagram, you should go through it while thinking about all relations: Why are they there?
What does it mean, that there's a relationship between these two entities. Sometimes it becomes
clear, that the relationship should be removed or that the cardinality should be changed. And
sometimes you'd find, that you have missed an important relationship between two entities.
This work is on the logical level. We are at this time thinking about what our model represents.
There's a logical relationship between car and user: A car can be bought by many users and a user
can buy many cars. As you'll see shortly, this is not a precise physical representation of the
database. For this reason, we call this level of the E/R diagram the Logical E/R diagram.

Page 14

Keld Marple, February 2010

car

equipment

stock

user

newsmail

n

n

m

1

n

m

background image

Version 1.4

You can add attributes to the E/R-diagram to get an idea of what your entities contains.

Doing this might reveal attributes, that are actually entities. Like the colour-attribute. This would be
much better as a separate entity. The final E/R-diagram then looks like this:

Page 15

Keld Marple, February 2010

car

equipment

stock

user

newsmail

n

n

m

1

n

m

model

colour

doors

volume

name

email

password

want_news

type

price

delivery

car

equipment

stock

user

newsmail

n

n

m

1

n

m

colour

n

1

background image

Version 1.4

Physical E/R diagram

After finishing the logical E/R-diagram, it is time to move on to the physical E/R diagram. The
physical E/R diagram is meant to be an exact resemblence of the database. Seven boxes in the
diagram, means seven tables in the database. This is different from the logical E/R-diagram.
The physical E/R diagram for the used cars company would look like this:

The real big difference between the logical and the physical E/R-diagram is that you can't have
many-to-many relationships in the physical E/R diagram. All many-to-many diagrams will have to
be split up into two one-to-many relationships, because this is how a many-to-many relationship is
actually (physically) build. So in the physical E/R diagram you'd add all the weak entities so that
the diagram shows the exact amount of tables in the database.

Normalisation

It is time to get hold of the attributes of the entities. Entities can be written as a list like this:

To make sure a database is correctly constructed, a few rules have been developed which can help
clean up inconsistent design. They are called the rules of normalisation and there are six of them.
We only concern ourselves with the first three rules, as they are the ones with practical significance.
The rules are called forms and we will now go through normal form 1, 2 and 3.

Page 16

Keld Marple, February 2010

car

equipment

stock

user

newsmail

car_user

car_equip

colour

car
car_id
model
colour
doors
volume

colour
colour_id
colour

stock
stock_id
street
number

user
user_id
name
street
house no.
postal no.
district

equipment
equip_id
type
price
delivery

car_user
car_id
user_id
buying_date
year_of_car

background image

Version 1.4

1 . n o r m a l f o r m
For a relational database to be in 1. normal form one must assure that all attributes are dependant on
the primary key. That means that you can't have an attribute where the content does not describe the
instance represented in the tuple, but actually could be described better on its own. The colour
attribute in the car-entity is an example. Before it was separated into its own table, the colour was
written directly in the car-entity, but we don't want to have to go to the car-entity to learn about a
colour. We want to be able to find everything regarding the colour (like it's name) in a colour entity.
Imagine that you had to find a certain car to find out what orange was like. Better to go to the tuple
"orange" in the colour table.
The 1. normal form also says, that the entities must be free of repeating fields. Repeating fields
means numerous columns describing the same type of information. Imagine that the car salesman
wanted to store a list of former owners of his car. One could then be tempted to include 3 or 4 extra
columns in the car-table called “owner 1”, “owner 2”, “owner 3” and “owner 4”. But this is not
good. If a car were to have more than 4 owners, there would not be enough place in the database.
That's messy design, and it has severe structural limitations. One obvious limitation is, that a car can
have a maximum of four owners.
It is much better to have a table called user (as in the model above) and then construct a relationship
between the two.

2 . n o r m a l f o r m
The second normal form says that if you have tables with composite primary keys, then all other
attributes must be equally dependant on both of them. A violation of the 2. normal form can be seen
in the car_user table above. The attribute called “year_of_car” has been wrongly placed in this
entity for several reasons. One of them is, that the attribute was thought of while creating the entity,
and it seemed perfectly sound to include an attribute about the age of the car when it is bought. But
this is not the right place to store the value. If the car is bought back by the used car company, and
then sold again, the value will be stored again. And even worse: The age of the car can only be
stored when a car is sold. This is what the 2. normal form is meant to prevent. The “year-of-car”
attribute is more dependant on the “car_id” foreign key than the “user_id” foreign key, for which it
is totally irrelevant. As the two foreign keys together form a composite primary key, the
“year_of_car” attribute can not be said to be equally dependant on both part of the primary key. The
attribute must be moved to the entity where it belong: The car entity.

3 . n o r m a l f o r m
The 3. normal form says that no attribute must be more dependant on another attribute than the
primary key. There is an example of that in the tables above as well. In the user-table the attribute
called “district” is part of the users address and for that reason one might think that it should be
placed inside the user-table like the rest of the address. But this is a mistake. Because the district is
actually not directly bound to the user as much as it is bound to the postal number. There is never a
case where the same postal number is combined with two different districts. The district follow the
number in all cases. If a user moves the address changes. The district also changes, but it does so
mainly because the postal code is changed.
For this reason the postal number and the district make up their own table, where the postal number
is the primary key and the district is the only attribute. That way potential errors are avoided.

Page 17

Keld Marple, February 2010

background image

Version 1.4

Redundancy

Generally speaking the normalisation phase is about avoiding errors. The worst error in a database
is repeating data. The technical term for this is redundancy. The reason why this has to be avoided,
is that it is the biggest source of serious errors. If data which means the same (like a postal district)
is repeated many times throughout the database, it can be difficult to guaranty consistence. If for
some reason the district is spelled wrongly in one instance, that error will be very hard to find.

Data dictionary

When the database model is finished, you can move on to the data dictionary. The data dictionary is
an important document for the programmer who's implementing the database. It is also an
invaluable document when it comes to maintaining the database. If you for some reason want to
change the database months after the launch of the application, you will need a document that tells
you exactly what's what. Don't assume that you'll be able to remember anything in half a year. You
wont. Databases quickly get very big, and they are structured in a way not necessarily suited for the
human mind. Documentation is the answer, and the data dictionary is the most detailed document
when it comes to attributes.
Here is an example of a data dictionary for one of the entities in the used car data model:

car

Datatype

Range

Default
value

NULL

unique indexed Foreign

Key (FK)

Auto-
increment

car_id

int

No

Yes

model

varchar

30

No

colour

int

No

To colour

doors

tinyint

4

No

volume int

No

year

year

Yes

A documentation like this tells the technician that implements the database how to program the
database. You should not expect the technicians to figure out for themselves the datatypes, range,
default values etc. for the attributes. Not because technicians are less intelligent, but because
nobody can maintain a clear picture of the whole system while typing at the keyboard. How should
a technician know if you've decided your varchars to be 30 or 40 characters in length. How would
you know, if you took over from somebody else?

Page 18

Keld Marple, February 2010


Document Outline


Wyszukiwarka

Podobne podstrony:
Brief Introduction to Hatha Yoga
An Introduction to Database Systems, 8th Edition, C J Date
A BRIEF INTRODUCTION TO MAT
Geiss An Introduction to Probability Theory
Heathen Gods and Rites A brief introduction to our ways of worship and main deities
Fokkinga A Gentle Introduction to Category Theory the calculational approach (1994) [sharethefiles
Physics Introduction to Superstring Theory (Schwarz)
An Introduction to Database Systems, 8th Edition, C J Date
Dahl M A brief introduction to Finsler geometry (web draft, 2006)(39s) MDdg (1)
An Introduction to the Theory of Numbers L Moser (1957) WW
An Introduction to Yang Mills Theory
INTRODUCTION TO THE LITERARY THEORY 14
Introduction to the Runes brief background information on runes, with table of Elder Futhark rune m
Jonathan Jacobs Dimensions of Moral Theory An Introduction to Metaethics and Moral Psychology 2002
An Introduction to Conformal Field Theory [jnl article] M Gaberdiel (1999) WW

więcej podobnych podstron