Using Transactions with SqlBulkCopyBy Scott Mitchell
SqlBulkCopyclass provides a mechanism for efficiently importing large amounts of data into a Microsoft SQL Server database. Compared to importing data by executing one
INSERTstatement for each record to import,
SqlBulkCopyis noticeably faster when importing thousands (or more) records. In a nutshell, importing data using
SqlBulkCopyentails creating a
SqlBulkCopyobject, specifying the destination database and table, and providing the data to import in the form of a
DataRow, or DataReader. In Using SqlBulkCopy To Perform Efficient Bulk SQL Operations we looked at how to use
SqlBulkCopyto programmatically import data from an uploaded Excel spreadsheet into a SQL Server database.
While Using SqlBulkCopy To Perform Efficient Bulk SQL Operations showed how to use the
SqlBulkCopy class, it did not explore how
imports fare in the face of an error. What happens if, when importing a total of 10,000 records, an error occurs when importing record number 501? Should those first 500 records
be committed or should the entire batch be rolled back? This article examines the default behavior of the
SqlBulkCopy class in the face of an error. It also
looks at how to perform bulk copy operations under the umbrella of a transaction. Read on to learn more!
Note: If you have not yet read Using SqlBulkCopy To Perform Efficient Bulk SQL Operations, please do so before reading this article...
Importing Data In Batches
One of the reasons why
SqlBulkCopyis so much faster than executing an
INSERTstatement for each imported record is because
SqlBulkCopysends large batches of records to the database rather than sending one record at a time. The number of records sent by
SqlBulkCopyin each batch depends on the value of the
BatchSizeproperty. By default, this value is set to zero, which causes
SqlBulkCopyto send all records to be imported to the database in one large batch.
Sending all of the records to the database in one batch works fine when you're dealing with a few thousand records, but if you're importing hundreds of thousands or millions of records into the database then you'll need use smaller batches. (If you try to send, say, 1,000,000+ records to the database in one batch you're going to have timeout issues.) The optimal batch size depends on factors like the network bandwidth and latency and the amount of bytes per record being imported. See What is the Recommended Batch Size for SqlBulkCopy? for a discussion on this topic.
|Another Reason Why |
Another reason why |
You can force
Non-Transactioned Imports With
WriteToServermethod is invoked, the
SqlBulkCopyobject connects to the database and imports the data in batches (whose size is determined by the
BatchSizeproperty). If there are no errors with the data being imported in the current batch then those records are committed to the database and then the next batch is processed. However, if there is an error then the entire batch is rolled back.
In Using SqlBulkCopy To Perform Efficient Bulk SQL Operations, we looked at how to allow a user to upload
an Excel spreadsheet and have its rows imported into a database table named
InventoryItems in a SQL Server database. The
InventoryItems table included
UnitPrice (among others); and the Excel spreadsheet uploaded by the user supplied
the values for these four columns for each imported row. Note that if the Excel document omits a value for the
UnitPrice columns, or
uses a non-money value (such as "Expensive!" rather than "4.25") then an error will occur when importing the data.
To illustrate the default commit/rollback behavior of the
SqlBulkCopy class, imagine that a user was uploading an Excel spreadsheet with 100 records and that record
number 25 did not specify a value for the
UnitPrice column. What would happen when the
WriteToServer method is invoked?
The answer depends on the value of the
BatchSize property. If the
BatchSize property has not been set then all
of the records will be sent in one batch. Because there is an error in that batch (with record number 25, specifically) then none of the records in the batch will
be committed. Consequently, because there was an error in the import the entire operation was rolled back.
But what if the
BatchSize property had been set to some value? What would happen then? It depends on the value of the
BatchSize property. If
BatchSize was set to 10 then
SqlBulkCopy would start by sending the first 10 records to the database, which would be imported and committed. It would
then send the next 10 and those would be committed as well. The third batch would include the erroneous record #25, which would generate an error. This error would cause
the entire contents of the third batch to be rolled back. Moreover, the error would cause
SqlBulkCopy to stop processing. While the error in record #25
causes the third batch to be rolled back, the first two batches were committed, meaning that the database now contains the first 20 records that were imported.
|Try This At Home!|
To observe this behavior on your own you can download the demo available at the end of this article. First things first, you'll need to create an erroneous record in the
import data. I suggest opening one of the Excel spreadsheets and deleting the value from one of the row's |
Next, view the contents of the
Bulk Operations And Atomicity
SqlBulkCopyoperation is not atomic. An atomic operation is one where the entire process fails or the entire process succeeds. With
SqlBulkCopy(with a non-zero
BatchSize) it is possible that some of the import is committed (early batches without a problem) whereas some of the import fails (as soon as an error is encountered the current batch is rolled back and no further batches are processed). This is, perhaps, an undesirable outcome.
Also, it is certainly possible that the import process might entail more than a call to the
WriteToServer method. For example,
the import process might involve three steps:
- Delete all records from the destination table,
- Import the records into the destination table, and
- Record the date and time of the most recent import in some other table.
SqlBulkCopyobject used in step (2) failed mid-way through the import process we'd want to not only rollback all of the records imported thus far by
SqlBulkCopy, but we'd also want to rollback the deleted records from step (1).
To perform atomic
SqlBulkCopyimports that span across all batches (and, optionally, across other database statements) we need to use transactions. The following steps outline the process of using a transaction with
- Create a SqlConnection to the destination database server.
- Open the connection.
- Create a
- Create the
SqlBulkCopyobject passing in the
SqlTransactionobject into the constructor.
- Perform the import - the call to
WriteToServer- within a
- If the operation completes, commit the transaction; if it fails, roll it back.
If you had other statements that needed to be performed under the umbrella of this transaction you would execute them within the
SqlBulkCopyClass (technical docs)