When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article

ASP ASP.NET ASP FAQs Message Board Feedback
 
Print this Page!
Published: Wednesday, November 11, 2009

Using Transactions with SqlBulkCopy

By Scott Mitchell


Introduction


The SqlBulkCopy class provides a mechanism for efficiently importing large amounts of data into a Microsoft SQL Server database. Compared to importing data by executing one INSERT statement for each record to import, SqlBulkCopy is noticeably faster when importing thousands (or more) records. In a nutshell, importing data using SqlBulkCopy entails creating a SqlBulkCopy object, specifying the destination database and table, and providing the data to import in the form of a DataTable, DataRow, or DataReader. In Using SqlBulkCopy To Perform Efficient Bulk SQL Operations we looked at how to use SqlBulkCopy to programmatically import data from an uploaded Excel spreadsheet into a SQL Server database.

While Using SqlBulkCopy To Perform Efficient Bulk SQL Operations showed how to use the SqlBulkCopy class, it did not explore how SqlBulkCopy imports fare in the face of an error. What happens if, when importing a total of 10,000 records, an error occurs when importing record number 501? Should those first 500 records be committed or should the entire batch be rolled back? This article examines the default behavior of the SqlBulkCopy class in the face of an error. It also looks at how to perform bulk copy operations under the umbrella of a transaction. Read on to learn more!

Note: If you have not yet read Using SqlBulkCopy To Perform Efficient Bulk SQL Operations, please do so before reading this article...

- continued -

Importing Data In Batches


One of the reasons why SqlBulkCopy is so much faster than executing an INSERT statement for each imported record is because SqlBulkCopy sends large batches of records to the database rather than sending one record at a time. The number of records sent by SqlBulkCopy in each batch depends on the value of the BatchSize property. By default, this value is set to zero, which causes SqlBulkCopy to send all records to be imported to the database in one large batch.

Sending all of the records to the database in one batch works fine when you're dealing with a few thousand records, but if you're importing hundreds of thousands or millions of records into the database then you'll need use smaller batches. (If you try to send, say, 1,000,000+ records to the database in one batch you're going to have timeout issues.) The optimal batch size depends on factors like the network bandwidth and latency and the amount of bytes per record being imported. See What is the Recommended Batch Size for SqlBulkCopy? for a discussion on this topic.

Another Reason Why SqlBulkCopy Is So Fast...
Another reason why SqlBulkCopy is so much faster than running individual INSERT statements is because by default SqlBulkCopy ignores any CHECK constraints. For instance, imagine that you have a table with a column of type money that has a CHECK constraint to ensure that its values are all greater than or equal to zero. When importing data into this table using SqlBulkCopy the CHECK constraint is ignored (by default), meaning that you can import data that violates the constraint (such as a value into the money column that is less than zero). (Similarly, by default the rows imported by SqlBulkCopy into the destination table do not cause any INSERT TRIGGERs to fire.)

You can force SqlBulkCopy to apply the CHECK constraints (or to fire the INSERT TRIGGERs) by specifying the applicable option in the SqlBulkCopy object's constructor.

Non-Transactioned Imports With SqlBulkCopy


When its WriteToServer method is invoked, the SqlBulkCopy object connects to the database and imports the data in batches (whose size is determined by the BatchSize property). If there are no errors with the data being imported in the current batch then those records are committed to the database and then the next batch is processed. However, if there is an error then the entire batch is rolled back.

In Using SqlBulkCopy To Perform Efficient Bulk SQL Operations, we looked at how to allow a user to upload an Excel spreadsheet and have its rows imported into a database table named InventoryItems in a SQL Server database. The InventoryItems table included columns named ItemID, ItemName, UnitCost, and UnitPrice (among others); and the Excel spreadsheet uploaded by the user supplied the values for these four columns for each imported row. Note that if the Excel document omits a value for the UnitCost or UnitPrice columns, or uses a non-money value (such as "Expensive!" rather than "4.25") then an error will occur when importing the data.

To illustrate the default commit/rollback behavior of the SqlBulkCopy class, imagine that a user was uploading an Excel spreadsheet with 100 records and that record number 25 did not specify a value for the UnitPrice column. What would happen when the WriteToServer method is invoked?

The answer depends on the value of the BatchSize property. If the BatchSize property has not been set then all of the records will be sent in one batch. Because there is an error in that batch (with record number 25, specifically) then none of the records in the batch will be committed. Consequently, because there was an error in the import the entire operation was rolled back.

But what if the BatchSize property had been set to some value? What would happen then? It depends on the value of the BatchSize property. If BatchSize was set to 10 then SqlBulkCopy would start by sending the first 10 records to the database, which would be imported and committed. It would then send the next 10 and those would be committed as well. The third batch would include the erroneous record #25, which would generate an error. This error would cause the entire contents of the third batch to be rolled back. Moreover, the error would cause SqlBulkCopy to stop processing. While the error in record #25 causes the third batch to be rolled back, the first two batches were committed, meaning that the database now contains the first 20 records that were imported.

Try This At Home!
To observe this behavior on your own you can download the demo available at the end of this article. First things first, you'll need to create an erroneous record in the import data. I suggest opening one of the Excel spreadsheets and deleting the value from one of the row's UnitCost column. Next, open the Default.aspx page's code-behind class and find the InsertViaSqlBulkCopyWithoutTransaction method, which starts on line 84. Set the BatchSize property to a value less than the row number whose UnitCost column value you erased. (For example, if you deleted the value for the 5th row's UnitCost column then set BatchSize to some value less than 5.)

Next, view the contents of the InventoryItems table (or run a query to return the total number of records in the table). Then visit Default.aspx through a browser and upload the Excel spreadsheet that contains the erroneous record. You should get an error noting that a NULL value cannot be inserted into the UnitCost column. If you refresh the contents of the InvetoryItems table (or rerun the query that returns the total number of rows in the table) you should see that those batches that came before the erroneous record were inserted into the database table.

Bulk Operations And Atomicity


The default SqlBulkCopy operation is not atomic. An atomic operation is one where the entire process fails or the entire process succeeds. With SqlBulkCopy (with a non-zero BatchSize) it is possible that some of the import is committed (early batches without a problem) whereas some of the import fails (as soon as an error is encountered the current batch is rolled back and no further batches are processed). This is, perhaps, an undesirable outcome.

Also, it is certainly possible that the import process might entail more than a call to the SqlBulkCopy object's WriteToServer method. For example, the import process might involve three steps:

  1. Delete all records from the destination table,
  2. Import the records into the destination table, and
  3. Record the date and time of the most recent import in some other table.
In this scenario we would want all three tasks to be atomic. If the SqlBulkCopy object used in step (2) failed mid-way through the import process we'd want to not only rollback all of the records imported thus far by SqlBulkCopy, but we'd also want to rollback the deleted records from step (1).

Performing Atomic SqlBulkCopy Imports


To perform atomic SqlBulkCopy imports that span across all batches (and, optionally, across other database statements) we need to use transactions. The following steps outline the process of using a transaction with SqlBulkCopy:
  1. Create a SqlConnection to the destination database server.
  2. Open the connection.
  3. Create a SqlTransaction object.
  4. Create the SqlBulkCopy object passing in the SqlTransaction object into the constructor.
  5. Perform the import - the call to WriteToServer - within a Try...Catch block.
  6. If the operation completes, commit the transaction; if it fails, roll it back.
The following code snippet implements these steps. For more details on how to perform transactions from ADO.NET check out Maintaining Database Consistency with Transactions.

' STEP 1 - Create a SqlConnection to the destination database server
Using destinationConnection As New SqlConnection(ConfigurationManager.ConnectionStrings("InventoryDBConnectionString").ConnectionString)
   ' STEP 2 - Open the connection.
   destinationConnection.Open()

   ' STEP 3 - Create a SqlTransaction object.
   Dim myTransaction As SqlTransaction = destinationConnection.BeginTransaction()

   ' STEP 4 - Create the SqlBulkCopy object passing in the SqlTransaction object into the constructor.
   Using bulkCopy As New SqlBulkCopy(destinationConnection, SqlBulkCopyOptions.Default, myTransaction)
      bulkCopy.DestinationTableName = "InventoryItems"

      'Define column mappings
      For Each col As DataColumn In excelData.Columns
         bulkCopy.ColumnMappings.Add(col.ColumnName, col.ColumnName)
      Next

      'With external transactions, the entire operation occurs under the umbrella of one transaction,
      'regardless of the BatchSize value
      bulkCopy.BatchSize = 100


      ' STEP 5 - Perform the import - the call to WriteToServer - within a Try...Catch block.
      Try
         bulkCopy.WriteToServer(excelData)

         ' STEP 6a - Success! Commit transaction
         myTransaction.Commit()

         Display some sort of success message...

      Catch ex As Exception
         ' STEP 6b - Failure - rollback the transaction
         myTransaction.Rollback()

         Display some sort of error message...
      End Try
   End Using

   destinationConnection.Close()
End Using

If you had other statements that needed to be performed under the umbrella of this transaction you would execute them within the Try...Catch block.

Happy Programming!

  • By Scott Mitchell


    Attachments:

  • Download the code for this article
  • Further Reading


  • SqlBulkCopy Class (technical docs)
  • Bulk Copy Operations in SQL Server
  • Transaction and Bulk Copy Operations
  • Maintaining Database Consistency with Transactions
  • What is the Recommended Batch Size for SqlBulkCopy?


  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article