Chapter 17
What Is Data Normalization?
A Typical Database Before Normalization
Rule 1: Eliminate Repeating Groups
Rule 2: Eliminate Redundant Data
Update Integrity Problems
Delete Integrity Problems
The Normalization Solution
Rule 3: Eliminate Columns Not Dependent on the Primary Key
Do Not Store Calculated Data in Your Tables
Rule 4: Isolate Independent Multiple Relationships
Rule 5: Isolate Related Multiple Relationships
Summary
Quiz
Exercises
Day 17
Database Normalization
Now that you understand the Data Definition Language (DDL) portion of SQL, it's time to apply that new knowledge to a lesson on database theory. Today you'll learn about the concept of data normalization. You'll develop a working definition of data
normalization and learn about the advantages of normalizing your databases. You'll also explore each of the five rules of data normalization, including reasons for applying these rules. When you have completed today's lesson, you will be able to identify
ways to use data normalization to improve database integrity and performance.
Throughout today's lesson, you will be normalizing a real database using the data definition SQL statements you learned about on Day 15 and Day 16, and by using Visual Basic's Visdata application that you
learned about last week (see Day 8, "Using Visdata").
The topic of data normalization could easily take up an entire bookand there are several excellent books on the subject. This lesson approaches data normalization from a practical standpoint rather than a theoretical standpoint. Here you'll focus
on two particular questions: What are the rules? How can these rules help me improve my Visual Basic database applications? To start out, let's develop a working definition of data normalization and talk about why it can improve your Visual Basic
applications.
What Is Data Normalization?
Data normalization is a process of refining database structures to improve the speed at which data can be accessed and to increase the database integrity. This is not as easy as it might seem. Very often, optimizing a table for speed is not the same as
optimizing for integrity. Putting together a database is a process of discovering the data elements you need and then creating a set of tables to hold those elements. The tables and fields you define make up the structure of the database. The database
structure you decide upon will affect the performance of your database programs. Some database layouts can improve access speed. For example, placing all related information in a single table allows your programs to locate all needed data by looking in one
place. On the other hand, you can lay out your database in a way that improves data integrity. For example, placing all the invoice line item data in one table and the invoice address information in another table prevents users from deleting complete
addresses when they remove invoice line items from the database. Well-normalized databases strike a balance between speed and integrity.
High-speed tables have few index constraints and can have several, sometimes repetitive, fields in a single record. The few constraints make updates, insertions, and deletes faster. The repetitive fields make it easier to load up large amounts of data
in a single SQL statement instead of finding additional, related data in subsidiary tables linked via those slower index constraints.
Databases built for maximum integrity have many small data tables. Each of these tables can have several indexesmostly foreign keys referencing other tables in the database. If a table is built with high integrity in mind, it is difficult to add
invalid data to the database without firing off database error messages. Of course, all that integrity checking eats precious ticks off the microchip clock.
Good data normalization results in data tables that make sense in a fundamental way. Well-normalized tables are easy to understand when you look at them. It is easy to see what kind of data they are storing and what types of updates need to be
performed. Usually, it is rather easy to create data entry routines and simple reports directly from well-normalized tables. In fact, the rule of thumb is this: If it's hard to work with a data table, it probably needs more normalization work.
For the rest of this lesson, you will be using the Visdata application to build data tables. If you have not already compiled the Visdata application, see the lesson on Day 8 for information on how to load, compile, and save the application. Day 8 also
covers how to use Visdata to maintain relational databases.
A Typical Database Before Normalization
To illustrate the process of normalization, let's start with an existing database table. The database NORMDAT1.MDB can be found in the CHAP17 directory. Load this into the Visdata application and open the Table1 data table. Your screen should look
something like the one in Figure 17.1.
Figure 17.1
Displaying Table1 before normalization.
This data table holds information about employees of a small company. The table contains fields for the employee ID and employee name, and the ID, name, and location of the department to which this employee is currently assigned. It also includes fields
for tracking the employee's job skills, including the skill code, the name, the department in which the skill was learned, and the ability level that the employee has attained for the designated skill. Up to three different skills can be maintained for
each employee.
This table is rather typical of those you will find in existing record-oriented databases. It is designed to quickly give users all the available information on a single employee. It is also a fairly simple task to build a data entry form for this data
table. The single form can contain the employee fields and the department fields at the top of the form and the three skill field sets towards the bottom of the form. Figure 17.2 shows a simple data form for this table
generated by Visdata.
Figure 17.2
The data entry form for Table1.
Access to the information in the table is fast and the creation of a data entry screen is easy. So this is a well-normalized table, right? Wrong. Three of the five rules of normalization that you will learn in the rest of this lesson are broken, and the
other two are in jeopardy! Some of the problems are obvious, some are not. Let's go through each of the five rules of normalization and see how applying these rules can improve the data table.
Rule 1: Eliminate Repeating Groups
The first area in which Table1 needs some work is in the repeating skill fields. Why include columns in the data table called SkillCode1, SkillCode2, SkillCode3 or SkillName1, SkillName2, SkillName3, and so forth? You want to be able to store more than
one set of skills for an employee, right? But what if you want to store data on more than three skills acquired by a single employee? What if most of the employees only have one or two skills, and very few have three skills? Why waste the blank space for
the third skill? Even more vexing is how easy will it be to locate all employees in the data table that have a particular skill?
NOTE:The first rule of data normalization states that you should make a separate table for each set of related columns and give each table a primary key. Databases that adhere to this first rule of normalization are in the First Normal
Form.
The first rule of data normalization is to eliminate repeating groups of data in a data table. Repeating groups of data, such as the skill fields (SkillCodeX, SkillNameX, SkillDeptIDX, and SkillLevelX), usually indicates the need for an additional
table. Creating the related table will greatly improve the readability of your tables and allow you to keep as few or as many skill sets for each employee as you need without wasting storage space.
The fields that relate to the employee skills need to be separated from the others in the table. You don't need to put all 12 skill fields in the new table, though. You only need one of each of the unique data fields. The new database now has not one,
but two data tables. One, called Skills, contains only the skill fields. The other table, called Employees, contains the rest of the fields. Table 17.1 shows how the two new tables look.
Table 17.1. Eliminating repeating data.
Skills Employees
EmpID EmpID
SkillCode EmpName
SkillName DeptID
SkillDeptID DeptName
SkillLevel DeptLocation
Notice that the first field in both tables is the EmpID field. This field is used to relate the two tables. Each record in the Skill table contains the employee ID and all pertinent data on a single job skill (code, name, department learned, and ability
level). If a single employee has several skills, there will be a single record in the Skill table for each job skill acquired by an employee. For example, if a single employee has acquired five skills, there will be five records with the same employee ID
in the Skills table.
Each record in the Skills table must contain a valid value in the EmpID field or it should be rejected. In other words, each time a record is added to the Skills table, the value in the EmpID field should be checked against values in the EmpID field of
the Employees table. If no match is found, the Skills record must be corrected before it is written to the database. You remember from the discussion of SQL Data Definition Language statements on Day 15 that this is a FOREIGN KEY
CONSTRAINT. The field EmpID in the Skills table is a foreign key that references the field EmpID in the Employees table. Also, the EmpID field in the Employees table should be a primary field to make sure that each record in the Employee table has a
unique EmpID value.
Now that you know the fields and index constraints you need, you can use SQL DDL to create two new tables. If you have not already done so, start the Visdata application and open the NORMDAT1.MDB database. Now you'll create two new tables that bring the
database into compliance with the first rule of data normalization.
First, create the table that holds all the basic employee data. This table has all the fields that were in the Table1 table, minus the skill fields. Using the information in Table 17.1 as a guide, enter an SQL DDL statement in the SQL window of Visdata
that creates the Employees data table. Your SQL statement should resemble Listing 17.1.
Listing 17.1. Creating the Employees table.
CREATE TABLE Employees
(EmpID TEXT(5),
EmpName TEXT(30),
DeptID TEXT(5),
DeptName TEXT(20),
DeptLocation TEXT(20),
CONSTRAINT PKEmpID PRIMARY KEY (EmpID));
Notice that the EmpID field has been designated as a primary key field. This guarantees that no two records in the Employees data table can have the same EmpID value. You can use the EmpID field in the next table you create (the Skills table) as the
reference field that links the two tables. Because you are using the EmpID field as a link, it must be a unique value in the Employees table in order maintain database integrity. What you are doing here is setting up a one-to-many relationship between the
Employee table (the one side) and the Skills table (the many side). Any time you establish a one-to-many relationship, you must make sure that the reference field (in this case, the EmpID field) is unique on the one side of the relationship.
Now that you have built the Employees table, you can create the table that holds all the skills data. Use the information in Table 17.1 to write a SQL DDL statement that creates a table called Skills. Make sure the new table has the field EmpID and that
the EmpID field is built with the correct index constraint to enforce one-to-many database integrity. Your SQL statement should look like the one in Listing 17.2.
Listing 17.2. Creating the Skills table.
CREATE TABLE Skills
(EmpID TEXT(5),
SkillCode TEXT(5),
SkillName TEXT(20),
SkillDeptID TEXT(5),
SkillLevel INTEGER,
CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID),
CONSTRAINT FKEmpID FOREIGN KEY (EmpID) REFERENCES Employees(EmpID));
You can see in Listing 17.2 that you have used the FOREIGN KEY...REFERENCES syntax to establish and maintain the table relationship. As you remember from the SQL lessons on Day 15 and Day 16, the FOREIGN KEY...REFERENCES syntax makes sure
that any entry in the Skills.EmpID field can be found in the related field Emplolyees.EmpID. If users enter a value in the Skills.EmpID field that cannot be found in any Employees.EmpID field, Visual Basic automatically issues a
database error message. This message is generated by Visual Basic, not by your program.
That's how you build tables that adhere to the first rule of data normalization. To see how these tables look when they have live data in them, use Visdata to load the CHAP17\NORMDAT2.MDB database. This database contains the Employees and Skills tables
with data already loaded into them. Figure 17.3 shows how Visdata displays the two new tables that have live data.
Figure 17.3
The new Employees and Skills tables from NORMDAT2.MDB.
NOTE:Before continuing with today's lesson, load the NORMDAT2.MDB database into Visdata.
Rule 2: Eliminate Redundant Data
Another aspect to the Skills table also needs attention. Although moving the repeating skills fields into a separate table improves the database, you still have work to do. The Skills table contains redundant data. That is, data is stored in
several places in the database. Redundant data in your database can lead to serious database integrity problems. It's best to eliminate as many occurrences of redundant data as possible.
NOTE:The second rule of data normalization states that if a column depends only on part of a multivalued key, you remove it to a separate table. In other words, if you need to fill in two fields in order to truly identify the record (JobID
and JobName), but only one of those fields is needed to perform a lookup in the table, you need a new table. Databases that conform to this rule are in the Second Normal Form.
For example, the Skills table includes a field called SkillCode. This field contains a code that identifies the specific skill (or skills) each employee has acquired. If two employees have gained the same skill, that skill appears twice in the Skills
file. The same table also includes a field called SkillName. This field contains a meaningful name for the skill represented by the value in the SkillCode field. This name is much more readable and informative than the SkillCode value. In essence, these
two fields contain the same data, represented slightly differently. This is the dreaded redundant data you have to eliminate!
Before you jump into fixing things, first review the details regarding redundant data and how it can adversely affect the integrity of your database.
Update Integrity Problems
When you keep copies of data elements in several rows in the same table or in several different tables (such as job names to go with the job ID codes), you'll have a lot of work ahead of you when you want to modify the copied data. If you fail to update
one or more of these copies, you can ruin the integrity of your database. Redundant data can lead to what are known as update integrity problems.
Imagine that you have built a huge database of employee skills using the tables you built in the preceding section. All is going great when, suddenly, the Human Resources Department informs you that it has designed a new set of names for the existing
skill codes. You now have to go through the entire database and update all the records in the Skill table, searching out the old skill name and updating the SkillName field with the new skill name. Because this is an update for the entire data table, you
have to shut down the database until the job is complete in order to make sure no one is editing records while you're performing this update. Also, you probably have to change some Visual Basic code that you built to verify the data entry. All in all, it's
a nasty job. If that isn't enough, how about a little power outage in the middle of your update run? Now you have some records that have the old names, and some have the new names. Things are really messed up now!
Delete Integrity Problems
Although the update integrity problem is annoying, you can suffer through most of those problems. In fact, almost all database programmers have had to face similar problems before. The more troublesome integrity problem resulting from redundant data
comes not during updates, but during deletes. Let's assume you have properly handled the mass update required by the Human Resources Department. Now you discover that there is only one employee in the entire database that has on file the SkillCode
S099 (the Advanced Customer Service course). No other employee has attained this high level of training. Now, that employee leaves the organization. When you delete the employee record from the file, you would delete the only reference to the
Advanced Customer Service course! There is no record of the existence of the Advanced Customer Service course in your entire database, which is a real problem.
The Normalization Solution
The way to reduce these kinds of data integrity problems is to pull out the redundant data and place it in a separate table. You need a single table, called SkillMaster, that contains only the SkillCode and the SkillName data fields. This table is
linked to the Skills table via the SkillCode field. Now, when the HR department changes the skill names, you only need to update a single recordthe one in the SkillMaster table. Because the Skills table is linked to the SkillMaster table, when you
delete that last employee with the certification for SkillCode S099, you won't be deleting the last reference to the skill. It's still in the SkillMaster table.
TIP:Another plus to this type of table separation is in speeding data entry. With only one field to enter, and especially a brief code, data entry operators can more quickly fill in fields on the table's form.
Also, you now have a single table that has a list of all the unique skills that can be acquired by your employees. You can now produce a Skills list for employees and managers to review. If you add fields that group the skills by department, you can
even produce a report that shows all the skills by department. This would be very difficult if you were stuck with the file structure you developed in the previous section.
So now let's redefine the Skills table and the SkillMaster table to conform to the second rule of data normalization. Table 17.2 shows the fields you need for the two tables.
Table 17.2. The field list for the Skills and SkillMaster tables.
EmpSkills Table SkillMaster Table
EmpID SkillCode
SkillCode SkillName
SkillDeptID
SkillLevel
You can see that you have renamed the Skills table to EmpSkills to better reflect its contents. You have also moved the SkillName field out of the EmpSkills table and created SkillMaster, a small table that contains a list of all the valid skills and
their descriptive names. Now you have the added bonus of being able to add a FOREIGN KEY constraint to the EmpSkills table. This will improve database integrity without adding any additional programming code!
Listing 17.3 shows the two SQL DDL statements that create the EmpSkills and the SkillMaster data tables. Note the use of FOREIGN KEY constraints in the EmpSkills table.
Listing 17.3. Creating the SkillMaster and EmpSkills tables.
CREATE TABLE SkillMaster
(SkillCode TEXT(5),
SkillName TEXT(20),
CONSTRAINT PKSkillMaster PRIMARY KEY (SkillCode))
CREATE TABLE EmpSkills
(EmpID TEXT(5),
SkillCode TEXT(5),
SkillDeptID TEXT(5),
SkillLevel INTEGER,
CONSTRAINT PKSkills PRIMARY KEY (SkillCode,EmpID),
CONSTRAINT FKEmpID2 FOREIGN KEY (EmpID)
REFERENCES Employees(EmpID),
CONSTRAINT FKSkillCode FOREIGN KEY (SkillCode)
REFERENCES SkillMaster(SkillCode));
Use Visdata to add these two new tables to the NORMDAT2.MDB database. The database NORMDAT3.MDB contains a complete database with the data tables Employees, EmpSkills, and SkillMaster fully populated with data.
Figure 17.4
The New Employees, EmpSkills, and SkillMaster tables.
You now have a database that conforms to the first two rules of data normalization. You have eliminated repeating data and redundant data. You have one more type of data to eliminate from your tables. You'll handle that in the following section.
NOTE:Before continuing with the lesson, load the NORMDAT3.MDB database into Visdata.
Rule 3: Eliminate Columns Not Dependent on the Primary Key
By now, you're probably getting the idea. You are looking for hints in the table structure that lead you into traps further down the road. Will this table be easy to update? What happens if you delete records from this table? Is it easy to get a
comprehensive list of all the unique records in this table? Asking questions like these can uncover problems that are not so apparent when you first build a table.
When you are building a data table, you should also be concerned about whether a field describes additional information about the key field? In other words, is the field you are about to add to this table truly related to the key field? If not, the
field in question should not be added to the table. It probably needs to be in its own table. This process of removing fields that do not describe the key field is how you make your data tables conform to the third rule of data normalizationeliminate
columns not dependent on keys.
NOTE:The third rule of data normalization states that if a column does not fully describe the index key, that column should be moved to a separate table. In other words, if the columns in your table don't really need to be in this table,
they probably need to be somewhere else. Databases that follow this rule are known to be in the Third Normal Form.
In these database examples, you have data describing the various departments in the company stored in the Employees table. Although the DeptID field is important to the Employees description (it describes the department to which the employee
belongs), the department-specific data should not be stored with the employee data. Yes, you need another table. This table should contain only department-specific data and be linked to the Employees table via the DeptID field. Table 17.3 lists the
modified Employees table and the new Department table.
Table 17.3. The modified Employees table and the new Department table.
Employees Department
EmpID DeptID
EmpName DeptName
DeptID DeptLocation
Notice that the Employees table is much simpler now that you have eliminated all unrelated fields. Use Visdata to construct SQL DDL statements that create the new Department table and then modify the Employees table and the EmpSkills table to increase
database integrity (yes, more foreign keys!). First, use the SQL DDL in Listing 17.4 to create the Departments table.
Listing 17.4. Creating the Departments table.
CREATE TABLE Departments
(DeptID TEXT(5),
DeptName TEXT(20),
DeptLocation TEXT(20),
CONSTRAINT PKDeptID PRIMARY KEY (DeptID))
Now alter the Employees table. You need to do two things:
Remove the DeptName column from the table.
Add a FOREIGN KEY constraint to enforce referential integrity on the Employees.DeptID field.
Listing 17.5 contains the SQL DDL statements to create the modified Employees table.
Listing 17.5. Creating the new Employees table.
CREATE TABLE Employees
(EmpID TEXT(5),
EmpName TEXT(30),
DeptID TEXT(5),
CONSTRAINT PKEmpID PRIMARY KEY (EmpID),
CONSTRAINT FKEmpDept FOREIGN KEY (DeptID)
REFERENCES Departments(DeptID))
Now you need to modify the EmpSkills table to add the referential integrity check on the EmpSkills.SkillDeptID field. The new SQL DDL should look like Listing 17.6.
Listing 17.6. Creating the new EmpSkills table.
CREATE TABLE EmpSkills2
(EmpID TEXT(5),
SkillCode TEXT(5),
SkillDeptID TEXT(5),
SkillLevel INTEGER,
CONSTRAINT PKEmpSkill2 PRIMARY KEY (SkillCode,EmpID),
CONSTRAINT FKSkillMast FOREIGNKEY (SkillCode)
REFERENCES SkillMaster(SkillCode),
CONSTRAINT FKSkillDept FOREIGN KEY (SkillDeptID)
REFERENCES Departments(DeptID));
Figure 17.5
The Department table added to NORMDAT4.MDB.
The database NORMDAT4.MDB contains a complete set of tables that conform to the third rule of data normalization. Use Visdata to load NORMDAT4.MDB and review the data tables. Attempt to add some data that does not follow the integrity rules. Try
deleting records. This shows you how Visual Basic issues database error messages when you try to save a record that breaks the referential integrity rules.
The first three rules of data normalization involve the elimination of repeating, redundant, or unrelated data fields. The last two rules involve isolating multiple relationships to improve overall database integrity. The first three rules are usually
all that you need to produce well-designed databases. However, there are times when additional normalization can improve the quality of your database design. In the next two sections, you will learn rules 4 and 5 of data normalization.
Do Not Store Calculated Data in Your Tables
It is important to note here that one of the results of the third rule of data normalization is that you should not store calculated fields in a data table. Calculated fields are fields that contain derived data such as year-to-date totals, a line in
the invoice table that contains the totals of several other rows in the invoice table, and so forth. Calculated fields do not describe the primary key. Calculated fields are derived data. It is a bad practice to store derived data in live data
tables.
Derived data can easily fall out of sync with the individual rows that make up the total data. What happens if the individual rows that add up to the total are altered or deleted? How do you make sure the row that holds the total is updated each time
any line item row is changed? Storing derived data might seem to be faster, but it is not easier. And dealing with derived data opens your database to possible update and delete integrity problems each time a user touches either the prime data rows or the
total data rows. Calculated data should not be stored. It should always be computed using the prime data at the time it is needed.
NOTE:Before continuing with this lesson, load the NORMDAT4.MDB database into Visdata.
Rule 4: Isolate Independent Multiple Relationships
The fourth rule of data normalization concerns the handling of independent multiple relationships. This rule is applied whenever you have more than one one-to-many relationship on the same data table. The relationship between the Employees table and the
EmpSkills table is a one-to-many relationship. There can be many EmpSkills records related to one Employee record. Let's add an additional attribute of employees to create a database that has more than a single one-to-many relationship.
Assume that the Human Resources Department has decided it needs more than just the skill names and skill levels attained for each employee. They also want to add the level of education attained by the employee for that skill. For example, if the
employee has an accounting skill and has an Associates degree in bookkeeping, Human Resources wants to store the degree, too. If an employee has been certified as an electrician and works in the Maintenance Department, the HR group wants to know that.
The first thing you might want to do is add a new column to the EmpSkills tablemaybe a field called Degree, maybe even a field for YearCompleted. This makes sense because each skill might have an associated education component. It makes sense, but
it is not a good idea. What about the employee who is currently working in the Customer Service Department but has an accounting degree? Just because the employee has a degree does not mean that employee has the skills to perform a particular job or is
working in a position directly related to his or her degree. The degree and the job skills are independent of each other. Therefore, even though the skills data and the degree data are related, they should be isolated in separate tables and linked via a
foreign key relationship.
NOTE:The fourth rule of data normalization dictates that no table can contain two or more one-to-many or many-to-many relationships that are not directly related. In other words, if the data element is important (the college degree) but not
directly related to other elements in the record (the customer service rep with an accounting degree), you need to move the college degree element to a new table. Databases that follow this rule are in the Fourth Normal Form.
Table 17.4 shows a sample Training table that can be used to hold the education information for each employee. Now the HR department can keep track of education achievements independent of acquired job skills. Note that the EmpID directly connects the
two relationships. If the Training table has only one entry per employee, the two relationships are a one-to-one relationship between the Employees table and the Training table, and a one-to-many relationship between the Employees table and the EmpSkills
table. Of course, if any employee has more than one degree, both relationships become one-to-many.
Table 17.4. The sample Training data table.
EmpID
Degree
YearCompleted
InstitutionName
Listing 17.7 is a sample SQL DDL statement that creates the Training data table with the proper relationship constraint. Enter this statement in the SQL Window of Visdata while you have the NORMDAT4.MDB database open.
Listing 17.7. Creating the Training table.
CREATE TABLE Training
(EmpID TEXT(5),
Degree TEXT(20),
YearCompleted INTEGER,
InstitutionName TEXT(30),
CONSTRAINT PKTraining PRIMARY KEY (EmpID,Degree),
CONSTRAINT FKEmpTrn FOREIGN KEY (EmpID)
REFERENCES Employees (EmpID))
Figure 17.6
The Training table shows the degree achievements for the Employees table.
The database NORMDAT5.MDB contains a complete version of the database normalized up to the fourth rule of data normalization. Use Visdata to open the database and review the table structure.
NOTE:Before continuing with the lesson, load the NORMDAT5.MDB database into Visdata.
Rule 5: Isolate Related Multiple Relationships
The last remaining rule of data normalization covers the handling of related multiple relationships in a database. Unlike the fourth rule, which deals with independent, one-to-many, multiple relationships, the fifth rule is used to normalize related,
many-to-many multiple relationships. Related, many-to-many multiple relationships do not occur frequently in databases. However, when they do come up, these types of data relations can cause a great deal of confusion and hassle when you're normalizing your
database. You won't invoke this rule often, but when you do it will pay off!
Imagine that the Maintenance Department decides it wants to keep track of all the large equipment used on the shop floor by various departments. It will use this data to keep track of where the equipment is located. The Maintenance Department also wants
to keep a list of suppliers for the equipment in cases of repair or replacement. When you were a novice, you might have decided to design a single table that held the department ID, equipment name, and supplier name. But, as I'm sure you have guessed by
now, that is not the correct response. What if the Maintenance Department has more than one supplier for the same type of equipment? What if a single supplier provides more than one of the types of equipment used in the plant? What if some departments are
restricted in the suppliers they can use to repair or replace their equipment?
NOTE:The fifth rule of data normalization dictates that you should isolate related multiple relationships within a database. In other words, if several complex relationships exist in your database, separate each of the relationships into
its own table. Databases that adhere to this rule are known to be in the Fifth Normal Form.
The following list shows the relationships that have been exposed in this example:
Each department can have several pieces of equipment.
Each piece of equipment can have more than one supplier.
Each supplier can provide a variety of pieces of equipment.
Each department can have a restricted list of suppliers.
Although each of the preceding business rules are simple, putting them all together in the database design is tough. It's the last item that really complicates things. There is more than one way to solve this kind of puzzle. The one suggested here is
just one of the many possibilities.
First, you need to expose all the tables that you need to contain the data. The preceding list describes two one-to-many relationships (department to equipment and department to supplier, with restrictions) and one many-to-many relationship (equipment
to supplier, supplier to equipment). Each of those relationships can be expressed in simple tables. Two additional tables not mentioned, but certainly needed, are a table of all the equipment in the building (regardless of its location) and a table of all
the suppliers (regardless of their department affiliation). Table 17.5 shows sample field layouts for the required tables. The Equipment and Suppliers tables are shortened in this example. If you were designing these tables for a real database project, you
would add several other fields.
Table 17.5. The Fifth Rule sample data tables.
Equipment Table Suppliers Table
EquipID SupplierID
EquipName SupplierName
DatePurchased SupplierAddress
Listing 17.8 contains the SQL DDL statements to create these tables.
Listing 17.8. Creating the Equipment and the Supplier tables.
CREATE TABLE Equipment
(EquipID TEXT (10),
EquipName TEXT(30),
DatePurchased DATE,
CONSTRAINT PKEquipID PRIMARY KEY (EquipID))
CREATE TABLE Supplier
(SupplierID TEXT (10),
SupplierName TEXT(30),
SupplierAddress MEMO,
CONSTRAINT PKSupplier PRIMARY KEY (SupplierID))
Figure 17.7
Supplier and Equipment tables in NORMDAT6.MDB.
The next two data tables describe the relationships between Suppliers and Equipment and between Suppliers and Departments. You remember that departments can be restricted to certain suppliers when repairing or replacing equipment. By setting up a table
such as the DeptSupplier table described next, you can easily maintain a list of valid suppliers for each department. Similarly, as new suppliers are discovered for equipment, they can be added to the EquipSupplier table. Refer to Table 17.6 for a sample
list of fields.
Table 17.6. EquipSupplier and DeptSupplier tables.
EquipSupplier DeptSupplier
EquipID DeptID
SupplierID SupplierID
These two tables are short because they are only needed to enforce expressed simple relationships between existing data tables. Creating small tables such as these is a handy way to reduce complex relationships to more straightforward ones. It is easier
to create meaningful CONSTRAINT clauses when the tables are kept simple, too. The SQL DDL statements for these two tables are in Listing 17.9.
Listing 17.9. Creating the EquipSupplier and DeptSupplier tables.
CREATE TABLE EquipSupplier
(EquipID TEXT(10),
SupplierID TEXT(10),
CONSTRAINT PKEqSpl PRIMARY KEY (EquipID,SupplierID),
CONSTRAINT FKEqSplEquip FOREIGN KEY (EquipID)
REFERENCES Equipment(EquipID),
CONSTRAINT FKEqSplSupplier FOREIGN KEY (SupplierID)
REFERENCES Supplier(SupplierID))
CREATE TABLE DeptSupplier
(DeptID TEXT(5),
SupplierID TEXT(10),
CONSTRAINT PKDeptSpl PRIMARY KEY (DeptID,SupplierID),
CONSTRAINT FKDptSplDept FOREIGN KEY (DeptID)
REFERENCES Departments(DeptID),
CONSTRAINT FKDptSplSupplier FOREIGN KEY (SupplierID)
REFERENCES Supplier(SupplierID))
Figure 17.8
EquipSupplier and DeptSupplier tables.
Notice that, in these two tables, the CONSTRAINT definitions are longer than the field definitions. This is common when you begin to use the power database integrity aspects of SQL databases.
Finally, you need a single table that expresses the Equipment-Supplier-Department relationship. This table will show which department has which equipment supplied by which supplier. More importantly, you can build this final table with tight constraints
that will enforce all these business rules. Both the Department-Supplier relationship and the Equipment-Supplier relationship are validated before the record is saved to the database. This is a powerful data validation toolall without writing any
Visual Basic code! Table 17.7 and the SQL DDL statement in Listing 17.10 show how this table can be constructed.
Table 17.7. The Department-Equipment-Supplier data table.
DeptID
EquipID
SupplierID
Listing 17.10. Creating the DeptEqpSulpr table.
CREATE TABLE DeptEqpSuplr
(DeptID TEXT(5),
EquipID TEXT(10),
SupplierID TEXT(10),
CONSTRAINT PFDeptEq PRIMARY KEY (DeptID, EquipID),
CONSTRAINT FKEqSupl FOREIGN KEY (EquipID,SupplierID)
REFERENCES EquipSupplier(EquipID,SupplierID),
CONSTRAINT FKDeptSupl FOREIGN KEY (DeptID,SupplierID)
REFERENCES DeptSupplier(DeptID,SupplierID))
Figure 17.9
The EquipSupplier, DeptSupplier, and DeptEqpSuplr tables.
The Microsoft Access database NORMDAT6.MDB contains a set of live data for the tables described in this section. Use Visdata to open the database and review the table structure. Try adding or deleting records in ways that would break integrity rules.
You'll notice that the last three tables defined (EquipSupplier, DeptSupplier, and DeptEqpSuplr) all do not allow edits on any existing record. This is because you defined the primary key as having all the fields in a record. Because you cannot edit a
primary key value, you must first delete the record, and then add the modified version to the data table.
Summary
In today's lesson, you learned how to improve database integrity and access speed using the five rules of data normalization. You learned the following five rules:
Rule 1: Eliminate Repeating Groups. If you have a set of fields that have the same name followed by a number (Skill1, Skill2, Skill3, and so forth), remove these repeating groups, create a new table for the repeating data, and relate it to the
key field in the first table.
Rule 2: Eliminate Redundant Data. Don't store the same data in two different locations. This can lead to update and delete errors. If equivalent data elements are entered in two fields, remove the second data element, create a new master table
with the element and its partner as a key field, and then place the key field as a relationship in the locations that formerly held both data elements.
Rule 3: Eliminate Columns Not Dependent on Keys. If you have data elements that are not directly related to the primary key of the table, these elements should be removed to their own data table. Only store data elements that are directly
related to the primary key of the table. This particularly includes derived data or other calculations.
Rule 4: Isolate Independent Multiple Relationships. Use this rule to improve database design when you are dealing with more than one one-to-many relationship in the database. Before you add a new field to a table, ask yourself whether this field
is really dependent upon the other fields in the table. If not, create a new table with the independent data.
Rule 5: Isolate Related Multiple Relationships. Use this rule to improve database design when you are dealing with more than one many-to-many relationship in the database. If you have database rules that require multiple references to the same
field or sets of fields, isolate the fields into smaller tables and construct one or more link tables that contain the required constraints that will enforce database integrity.
Quiz
Is it a good idea to optimize your database strictly for speed?
What is meant by the term First Normal Form?
Explain how the second rule of data normalization differs from the first rule of normalization.
Should you include fields in a data table that are the calculated results of other fields in the same table?
When would you invoke the fourth rule of data normalization?
When would you invoke the fifth rule of data normalization?
Exercises
As a computer consultant, you have landed a contract to build a customer tracking system for your local garage. After several days of interviews with the owner, mechanics, and staff members, you have determined that the following data fields should be
included in your database. Many of the customers of this garage have more than one automobile. Therefore, you are requested to leave room for tracking two cars per customer.
Use these fields: CustomerID, CustomerName, Address, City, State, Zip, Phone, SerialNumber, License, VehicleType1, Make1, Model1, Color1, Odometer1, VehicleType2, Make2, Model2, Color2, Odometer2.
Optimize this data into tables using the rules of data normalization discussed in today's lesson. Identify all primary and foreign keys.
Write the SQL statements that create the tables you designed in Exercise 1.
Wyszukiwarka
Podobne podstrony:
ch17 (2)ch17 (24)ch17#ch17&ch17ch17CH17 (15)ch17ch17ch17 (9)ch17ch17ch17CH17ch17ch17(więcej podobnych podstron