This article discusses data moling techniques. For an overview of data modeling, including why data modeling is important, please read Data Modeling 101.
Data modeling is the process by which you design your database's tables, columns, and relations. A good way to determine what tables, columns, and relations you'll need to is begin by writing down what, exactly, you are needing your database to do for you.
For example, say that we were designing an application that would handle payroll for the company. It is essential that we have a clear understanding of what the system will need to do. An single paragraph will usually suffice. Write down this paragraph, and take it to your supervisors and project coordinaters, making sure everyone is on the same page. For our example, such a paragraph might read:
The payroll system will track an employee's hours (his or her starting and stopping times during the day). Social security numbers will be used to uniquely identify each employee. For reporting purposes, an employee will designate his working hours under various categories. The categories to which an employee can assign his time is dependent upon which department the employee is in.
While the above paragraph may seem a bit chopy (which, I confess, it is), it illustrates my point. With this paragraph, we can quickly ascertain what tables we'll need: an Employee table, a Category table, a Department table, and a TimeSpent table. Basically, look for nouns in your example paragraph, and these are often tables. Adjectives are object columns. (Social-Security-Number will be a column in our Employee table.) Verbs are often relations. (Each employee HAS a department: there will be a DepartmentID column in Employee which will be a Foreign Key to the Department table.)
Not only does composing such a brief synopsis help you determine your database's structure, but also helps you ease into an important practice: documentation!! It is vital that you document everything! Preliminary table designs, data design outlines, technical specifications, and all other documents need to be grouped and filed for easy access.
One of the main reasons to data model is to enhance your application's maintainability.
Good documentation is important. Other important steps you can take to improve the
maintainability of your program is to write self-documenting code. Self-documenting
code is code written in such a way that tons of comments aren't needed. In
database design, this means naming tables and columns with sensible names. Call
your employee table
P_Employee, the P_ prefix
designating that this table is used in the payroll system), not TABLE01, or some
other arcane name. Choose appropriate column names too. Use some sort of
Hungarian or prefix notation when writing your stored procedures and views.
Today's databases are relational, so take advantage of that fact! Use code tables! (Code tables are small tables which usually only have two or three columns: a primary key (unique ID) and a column or two English-like description.) Examining our example above, where there are departments to which an employee belongs to... in the olden days of flat-file databases, one would design the Employee table like:
|EmployeeID (int, PK)
Where department signified their department (obviously). HUMAN might signify that the employee was in Human Resources. SALES would signify sales; MANAG might signify management. In either case this is bad design! Imagine that a typo was made, and an employee's department field read, "HUMAM." This would probably screw up reporting scripts which checked this field. What happens if new departments are added, or old ones are removed? Lastly, what if the department name is REDTQ, what is that department? A new developer will have no idea and will have to waste time figuring out archaic names and symbols.
Better data design would be to use a code table, like so:
|EmployeeID (int, PK)
DepartmentID (int, FK to Department table)
|DepartmentID (int, PK)
Now, for each department there is a row in the Department table. Using FK constraints, it is impossible for an employee to be assigned to a department that doesn't exist. Also, there is no ambiguity of departments, since now you have 50 characters to describe the department in the department table. This is a better approach; databases are relational for a reason! This is one of them, so make sure to take advantage of this relations! I have seen far too many developers refuse to use code tables. By doing so, they made their database design hard to understand, less efficient, and less maintainable!
Data modeling isn't difficult and by applying the tenants of data modeling, you will ensure a more maintainable database design!