ThePlace

Home ] Search ] Resources ] Site Map ] Contact Me ]
Dave's Information Technology Resource

Up ]

What are Databases? ] Relational Databases ] Data Modeling ] [ Designing and Implementing Databases ] Creating Database Applications ]

--- Designing and Implementing Databases ---

 

This section outlines a general strategy for designing and implementing databases.

Define Entities in a Database

Determine the entities that make up the database.  This means you must decide what the database will contain.  For example, a student tracking database would probably include the following entities:

bullet

 Students – who is the database keeping track of

bullet

 Courses – which courses are available

bullet

 Classes – which classes are available

bullet

 Instructors – who is teaching the courses

bullet

 Schedules – putting students into classes

Define Attributes for the Entities

Once you know what entities you are dealing with, start defining the attributes that make up the entities.  For example, the students would include:

bullet

 Student ID

bullet

 Last name

bullet

 First name

bullet

 Address

bullet

 City

bullet

 State

bullet

 Zip

 

Remember that you need to define what information is relevant to the student entity.  This process is repeated for all entities.

Determine Relationships Between Entities

At this point, the relationships can be developed between entities by looking at common data.  Relationships fall into three basic categories:

  1. One to one

  2. One to many

  3. Many to many

 

After you've set up different entities for each subject in your database, you need a way of telling the database how to bring that information back together again. The first step in this process is to define relationships between your entities.

 

How do relationships work?

A relationship works by matching data in key attributes.  In most cases, matching attributes are the primary key from one table, which provides a unique identifier for each record, and a foreign key in the other table. For example, key attributes such as the student ID, course ID, and class ID can  relate student, class, and course entities.

 

A one-to-many relationship

A one-to-many relationship is the most common type of relationship. In a one-to-many relationship, an entity in Table A can have many matching entities in Table B, but a entity in Table B has only one matching entity in Table A.

A many-to-many relationship

In a many-to-many relationship, an entity in Table A can have many matching entities in Table B, and a record in Table B can have many matching entities in Table A.  This type of relationship is only possible by defining a third entity (called a junction) whose primary key consists of two attributes including the foreign keys from both Tables A and B. A many-to-many relationship is really two one-to-many relationships with a third entity.

 

 

A one-to-one relationship

In a one-to-one relationship, each record in Table A can have only one matching entity in Table B, and each record in Table B can have only one matching entity in Table A.  This type of relationship is not common, because most information related in this way would be in one entity.  You might use a one-to-one relationship to divide a table with many attributes, to isolate part of a table for security reasons, or to store information that applies only to a subset of the main entity.

 

Establishing and Maintaining Referential Integrity

When using foreign keys to establish relationships, you must be careful to make sure that keys are maintained between entities.  For example, only a student in the student table can be in a class.  The need to ensure that each value in the foreign key exists in the appropriate entity is called referential constraint.  Furthermore, ensuring that there are no dangling references is called maintaining referential integrity.

Enabling Cascaded Updates and Deletions

Many database tools allow you to automatically update and or delete entities as necessary to maintain referential integrity.

Building Tables

Building tables is simple:

  1.  Establish a table for each entity.

  2.  Assemble the fields for each attribute.  Assign the appropriate data types as necessary for each field.

  3.  Establish the relationship mechanisms (one to many, etc.) for all tables.

  4.  Add data.

Using Indexes

Indexes are used to provide random access to data in a database.  Indexes may be used to speed up accesses to a single table based on a single or multiple fields.  Indexes consume additional space in the database but offer significant increases in speed and performance.

Using Null Values

Null values are empty data that are used for two reasons:

  1. To indicate that data is missing or unknown.

  2. To indicate that a value is not applicable in the current context.

 

Generally speaking, null values are never used in key fields for obvious reasons (remember, key fields are used to determine uniqueness for the record/entity).

Managing Redundancy

Managing redundancy is the process of ensuring that no data is repeated in a database.  It is based on the fact that attributes are used for:

bullet

 Information

bullet

 Identification

bullet

 Both identification and information.

 

Generally speaking, most attributes provide information, e.g., names, addresses, and phone numbers are all useful information that you find in a database.

An example of an attribute for identification would be the following table scheme:

(PubID, PubName, PubAddress, YearFounded)

In this example, PubID is used to provide a unique “identifier” or key for each record.  The actual value of PUBID has little informational value except to ensure that each record is unique (i.e., it is also the key to the table).

An example of identification and informational attributes would be the following table scheme:

(ISBN, Title, PubID)

In this example, ISBN is informational in that it provides useful information and also uniquely identifies each record (the key).

Managing redundancy is simply the process of ensuring that information is not repeated.  In the previous examples, the PubName field is not repeated in any other tables because the PubID provides the reference to the Publisher table that contains all relevant publisher information.

Normalization

The goal of normalization is to prevent, or at least minimize, redundancy.  It is based on a series of forms as follows:

bullet

 First normal form (1NF)

bullet

 Second normal form (2NF)

bullet

 Third normal form (3NF)

bullet

 Boyce Codd normal form (BCNF)

Technically, there are additional normalization forms (fourth normal form (4NF) and fifth normal form (5NF)) which are beyond the scope of this text.

In each succeeding instance (from first to second, etc.), a greater level of normalization occurs that helps to eliminate redundancy in a table and database.

First Normal Form (1NF) – all attributes in a table are indivisible.  For example, the following table schema: (ISBN, Title, Authors, Publisher) is not in first normal form because more than one author can be included (the authors field could be “Smith, Tom and Jones Ann”).  To be in 1NF form, authors would be limited to one name only.

Second Normal Form (2NF) – all informational attributes are unique to the table and not to some other table (entity).  For example, the following schema: (Address, City, State, CityPopulation) has an attribute, CityPopulation, that is subordinate to City (it depends on which city and otherwise nothing in common with the other fields).  A separate table would be used to capture City related information such as CityPopulation.  Note that CityPopulation would have to be repeated whenever City was used.

Third Normal Form (3NF) – every informational attribute depends on the superkey.  For example, (Title, PubID, PageCount, Price) is NOT in 3NF because Title and PubID are keys AND PageCount could be redundant because a publisher (PubID) could base Price on the PageCount.

Boyce Codd Normal Form (BCNF) – related to 3NF, every attribute can ONLY belong to a superkey.  For example, (city, streetname, zipcode) can have redundant information because a city and zipcode can have multiple streets.  Therefore, city and streetname are combined and street and zipcode are combined.

Decomposition and Realism

Decomposition is the process of reducing complex tables (entities) into normal forms.  The problem often lies in two areas:

  1. Possible loss of information.

  2. Possible loss of dependence.

In both cases, final database design must depend on the use of “common sense” and practicality to combine information even though it violates some level of normalization (especially 3NF and BCNF). 

 

The bottom line on database design is that it is part art and part science.  The art aspect is the intuitiveness to balance aesthetics, understanding, and information integrity with database design techniques including normalization.  The science aspect is to provide for information complexity and volume using the various commercial database management tools.

 

Home ] Up ] Computer Architecture ] Programming Bootcamp ] Visual BasicS ] Web Basics ] Web Multimedia ] Web Programming ] Advanced Web Topics ] Developing Web Sites ] XML Technology ] Web Glossary ]

Copyright © 1999 - 2005 
ThePlace - Written and Sponsored by Dave Hillman