When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
Technology Jobs
ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
Print this page.
The SQL Guru Answers your Questions...


Today's question comes from Tim H.:

Sean,

I went to a Microsoft sponsored SQL Server seminar recently where the speaker mentioned that you should avoid using identity fields for primary keys when the table has records inserted frequently (5000 concurrent users?) - a type log. His argument was that it creates artificial "hot spots" at the end of the table. Whatever that means.

    1. What do you do if you need to refer to a record in a high volume table? Have a beefy "where" clause? Multi-field primary key?
    2. Is it better to have a unclustered primary key for high volume insert table?
    3. I heard of having a separate table that has the next available "ID"; taking the ID and then increment by 1 in a transaction. What do you think of this suggestion?

I know that hardware is a factor, but just factor everything "all things being equal". I am asking this question because I have over 5000 users entering time into a log. After a person inserts a record, I need to update a web page with the last 5 records they entered (as a reminder for them). The kicker is I have to perform a Sum() on the TotalTime field, because the user is constricted by the amount of hours they can enter per month. They can only bill until their budget is depleted. I am tempted to have a field in the user table that has their current budget amount thereby avoiding the Sum() in the SQL statement. I am afraid of the amount becoming skewed for whatever reason. Hopefully, batch updating will alleviate the possibility of poor performance.

So, I have a table that has potential for high volume inserts and reads.

Looking forward to your thoughts.

Tim H.

Tim,

I went to a Microsoft sponsored SQL Server seminar recently where the speaker mentioned that you should avoid using identity fields for primary keys when the table has records inserted frequently (5000 concurrent users?) - a type log. His argument was that it creates artificial "hot spots" at the end of the table. Whatever that means.

To learn more about indexes, see our Related Links section!

A "Hot Spot" is what you end up on when your boss wants to know why the application isn't performing well. Or is that the Hot Seat? :)

Anyway, the truth of that statement depends on the circumstances. Is this a SQL 6.5 or 7.0 server? If 6.5, have you enabled row-level locking for the table? Is the PK index clustered or nonclustered?

The problem he's referring to happened in SQL 6.5 if you weren't careful with your database design. Recall that a clustered index physically sorts the data pages in a table. If you put a clustered index on an IDENTITY column, then all of your inserts will happen on the last page of the table - and that page is locked for the duration of each IDENTITY. No big deal... unless you have 5000 people that all want the last page. Then you have a lot contention for that page, or a "hot spot".

You can turn on insert row-level locking in SQL 6.5, and that will only lock the row being inserted, thus reducing lock contention. You can also move your clustered index to a different column, thereby scattering the inserts around the table.

SQL 7.0 doesn't have this problem because it adjusts the granularity of the lock as needed. Lots of contention for the page? Let's bump the granularity down to a row lock! It works very well. Lock contention problem that were plaguing us in 6.5 are gone in 7.0.

1. What do you do if you need to refer to a record in a high volume table? Have a beefy WHERE clause? Multi-field primary key?

IDENTITY columns work great for Primary Keys. They're small (4 bytes) if you use an int data type. And small keys are awesome because they're faster to index and search. IDENTITY columns in 6.5 suffer a problem if the server is shut down unexpectedly... they forget what number they're on. I usually recommend running DBCC CHECKIDENT on each table when the server starts up.

2. Is it better to have a unclustered primary key for high volume insert table?

Well, placing a clustered index (PK constraints create a clustered index by default) on an IDENTITY column isn't all that useful. You're usually better off placing it on a column(s) that you search on pretty frequently, such as a name or a date. Especially if you tend to return a range of rows based on the clustered index criteria, or sort by the key column(s). SQL Server loves doing that, because it can just run right down the pages on disk.. zip, zip, zip.

Another factor to consider is that in SQL 7.0, nonclustered indexes use the clustering key at the leaf level of the nonclustered index tree. So if you have a clustered index with a huge key, your nonclustered indexes will be larger as a result. Try to keep the size of the clustering key as small as possible.

3. I heard of having a separate table that has the next available "ID"; taking the ID and then increment by 1 in a transaction. What do you think of this suggestion?

Yep, we had to do that before SQL Server had auto-incrementing columns. Now that we're living in the 21st century, there's not much reason to use this tactic.

I know that hardware is a factor, but just factor everything "all things being equal".

Buy fast disks. And lots o' memory (for data cache) will help with the queries.

I am asking this question because I have over 5000 users entering time into a log. After a person inserts a record, I need to update a web page with the last 5 records they entered (as a reminder for them).

So the person ID sounds like a possible candidate for a clustered index. If you're using 7.0, try the SELECT TOP 5 * FROM... syntax as well. SQL Server can now optimize queries based on a limited rowcount.

The kicker is I have to perform a Sum() on the TotalTime field, because the user is constricted by the amount of hours they can enter per month. They can only bill until their budget is depleted. I am tempted to have a field in the user table that has their current budget amount thereby avoiding the Sum() in the SQL statement. I am afraid of the amount becoming skewed for whatever reason.

Not a bad plan. With the judicious use of transactions, you can do both your insert and your update to the person table without worry.

Hopefully, batch updating will alleviate the possibility of poor performance. So, I have a table that has potential for high volume inserts and reads.

It's usually best to separate reporting queries and transaction processing. You could, perhaps, move record out of the time table into a data warehouse as they age.

Hope this helps,
Sean


Related Links:

  • Creating a Clustered Index
  • Technical Discussion of Nonclustered Indexes
  • An article on Clustered/Nonclustered Indexes


    Read Other SQL Guru Questions

  • Software Developer / Programmer - Distributed Systems (NYC)
    Next Step Systems
    US-NY-New York

    Justtechjobs.com Post A Job | Post A Resume


    ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article