Using SqlBulkCopy To Perform Efficient Bulk SQL OperationsBy Scott Mitchell
Over the years I've worked on a number of projects that have shared a common requirement - the ability for users to quickly import large amounts of data into a back end SQL Server database. One such project was a web application used by teachers and other staff members. The software was initially purchased for just two schools in the district, but was soon expanded to encompass other schools. Every few months one or two new schools were brought into the fold; every time a new school was added an administrative user would have to create accounts for the new teachers and staff members so that they could sign into the site. Initially, the application offered a web page for the administrator to create new user accounts one at a time, but this interface quickly became tedious and impractical once larger schools with upwards of 100 users were brought online.
To allow school administrators to quickly import new users we created a web page from which a user could upload an Excel spreadsheet that contained the one row for every new user; the columns in the spreadsheet mapped to table columns in the database. After uploading this spreadsheet, the application would walk through each row and insert a record into the table. Through this mechanism an administrator could create the 100+ user accounts by first building an Excel spreadsheet (something many of the schools already had on file) and then upload that spreadsheet. This technique may sound familiar - in Importing an Excel Spreadsheet Using Typed DataSets and TableAdapters, author Nannette Thacker walked through building such an interface.
While the described approach works well when importing hundreds of records, it starts to seriously slow down when importing thousands or tens of thousands of records.
The slowdown is due to the fact that each imported record sends its own
INSERT statement to the database. This results in a lot of "chatter" between the web
server and the database server. If you are importing data to Microsoft SQL Server the good news is that this process can be dramatically sped up using ADO.NET's
SqlBulkCopy class. In my testing, importing 10,000 records using
INSERT statement per import record took more than three seconds to complete; using
SqlBulkCopy took a fraction of a second.
This article looks at how to use the
SqlBulkCopy class to efficiently execute bulk operations against Microsoft SQL Server. Read on to learn more!
An Overview of
SqlBulkCopy and the Demo Application
SqlBulkCopyis a class that was added to the .NET Framework version 2.0 (in the
System.Data.SqlClientnamespace). It provides an efficient means to import data into a Microsoft SQL Server database. Before using the
SqlBulkCopyclass, you first need to get the data to import into Microsoft SQL Server. The import data can come from virtually any data source, as
SqlBulkCopycan import data from a DataReader,
DataTable. Once the import data is ready, you create an instance of the
SqlBulkCopyclass and specify the destination database and the table to import the data into. You can optionally specify column mappings - indicating what columns in the source data are mapped to what columns in the destination table - along with other options. Finally, to perform the import, call the
WriteToServermethod, passing in the data to import.
The download available at the end of this article includes a demo that imports data from an uploaded Excel spreadsheet into a table in a Microsoft SQL Server database.
The demo aims to mimic an inventory tracking web application that offers users the ability to import data into the system by uploading an Excel spreadsheet.
The SQL Server database, which is located in the
App_Data folder, contains a single table named
InventoryItems with the following schema:
|Primary key; |
|Has a default value: |
The Excel spreadsheet includes four columns that specify the values that will be imported into the
UnitPrice columns; the
DateAdded values will be automatically assigned by the database.
Preparing the Excel Spreadsheet Data
The demo available for download includes a web page (
Default.aspx) from which a user can upload an Excel spreadsheet. On upload, the Excel spreadsheet is saved to the web server's file system. Next, the spreadsheet's contents are read into a
excelData, and this
DataTableis what will eventually be passed into the
WriteToServermethod to perform the import.
The following code shows how to connect to the Excel spreadsheet and read its contents into a
DataTable. The variable
uploadFileName contains the
physical path to the location where the just-uploaded Excel spreadsheet was saved on the web server's file system. For more information on querying Excel spreadsheets
from .NET code, see Importing an Excel Spreadsheet Using Typed DataSets and TableAdapters.
Note: The connection string used in this example works for Excel spreadsheets saved in the Excel 97-2003 format (
Connection Strings for Excel 2007 to learn how to connect to an Excel 2007 document.
Creating, Configuring, and Importing Data With The
To use the
SqlBulkCopyclass you need to supply it information about the database connection to use. This can be in the form of a
SqlConnectionobject or as a connection string. You also need to specify the name of the destination database table. Once these required bits of information have been set, you can import the data by calling the
WriteToServermethod and passing in the source data.
The following snippet shows how to import data using the
SqlBulkCopy object when using an explicit
The first line of code in the above snippet creates a
SqlConnection object named
destinationConnection. Next, a
SqlBulkCopy object is created
and the destination
SqlConnection object is passed in through the constructor. Following that, the destination database table name is specified via the
DestinationTableName method. Finally, the data is written to the database via the
(Recall that excelData is a
DataTable that contains the data from the uploaded Excel spreadsheet.)
Alternatively, you can omit instantiating the
SqlConnection object and instead create just a
SqlBulkCopy object, passing in the connection string to
the destination database like so:
The following snippet shows
SqlBulkCopy using existing
Specifying Column Mappings
While the above code compiles without error, you'll get a runtime error when uploading the Excel spreadsheet. The reason is because when
SqlBulkCopyperforms the import it does so by mapping the first column of the source data to the first column of the destination table, the second column of the source data to the second column of the destination table, and so on. It does not perform the mapping by column name.
Our Excel spreadsheet has four columns, the first being for
ItemID, the second for
ItemName, and so forth. However, the destination database table
InventoryItems) has five columns, with its first column being an
IDENTITY column (
InventoryItemID). Consequently, the above
code will attempt to put in the Excel spreadsheet's
ItemID values into the
InventoryItemID column in the database, and the Excel spreadsheet's
ItemName values into the table's
ItemID column, and so on.
In cases where the column positions in the source and destination do not match up precisely, you can specify the column mappings programmatically
ColumnMappings collection. Such mismatches can occur if the destination table includes an
or columns with default values that are not specified in the source data. Even if the source and destination include the exact same set of columns, you will need to
specify column mappings if the order differs. For example, if the Excel spreadsheet's columns started with
ItemID and was followed by
UnitCost, but the
InventoryItems table's columns were ordered
UnitPrice, you would need to define a column mappings to correctly align the import. Failing to do so would result in the source's
getting entered into the destination's
ItemName column, the
ItemName values into the
ItemID column, the
UnitPrice values into
UnitCost column, and the
UnitCost values into the
UnitPrice column. In short, it would be a mess.
Let's enhance the above snippet of code to map each column in the source data to the column with the same name in the destination table (regardless of the columns' ordinal positions):
With the three lines of code added above, each column in the Excel spreadsheet is mapped to the column in the destination table with the same name. Because the Excel spreadsheet
does not contain an
InventoryItemID column, the database system will auto-generate the value like it would with a normal
SqlBulkCopy to One
INSERT Statement Per Import Record
Programmatically importing data from a source data store to a destination data store typically involves looping through the source data one record at a time and, for each import record, executing an
INSERTstatement on the destination data store. As we've seen in this article, when importing data into a Microsoft SQL Server database another alternative is to use the
SqlBulkCopyobject, which has two advantages:
SqlBulkCopyrequires far less code. Our final snippet is seven lines of code long (excluding comments and white space). The code for inserting one record at a time will likely be at least twice as many lines long.
SqlBulkCopyis much faster because it is less "chatty" with the database server.
SqlBulkCopyand inserting one record at a time, I created a simple, highly unscientific test case. Namely, I created a web page that allows the visitor to choose how the data should be inserted - via
SqlBulkCopyor through a loop where an
INSERTstatement is executed once for each source record. I then uploaded Excel spreadsheets with 10, 1,000, and 10,000 records and measured the running time between
SqlBulkCopyand the one
INSERTper source record approaches. The times posted below are in milliseconds (1,000 ms = 1 second). (This test page and the test data are included in the download at the end of this article.)
|10 Records||1,000 Records||10,000 Records|
|4 ms||16 ms||180 ms|
Per Import Record
|5 ms||333 ms||2,978 ms|
When importing 10,000 records
SqlBulkCopy was an order of magnitude faster than running one
INSERT statement per source record. I would expect
that this performance gap would continue to grow along with the number of rows being imported.
One thing we didn't touch on in this article is
SqlBulkCopyand transactions. When importing data, what do you want to do if there is an error after, say, half of the records have been added? Do you want to remove those just-inserted records or rollback the entire operation? This topic, as well as more advanced
SqlBulkCopyconfiguration options, is examined in another article: Using Transactions with SqlBulkCopy.
SqlBulkCopyClass (technical docs)