Querying a Multi-Tenant Data ArchitectureBy Scott Mitchell
During my career as a consultant and web developer I've worked on a number of "software as a service" (SaaS) web applications. Such web applications are designed to be used by multiple customers, where each customer is typically a company that has dozens or hundreds of users (its employees or customers) that access the system. One such SaaS-style application that I've worked on for the past several years is a hosted application used by hospitals and clinics to manage patient information, doctor's appointments, billing, prescriptions, and so forth.
When developing a SaaS-style application you need to decide how to store each customer's data. The article Multi-Tenant
Data Architecture examines different data architectures for SaaS applications. One option presented in the article is to store all customer data in a single database.
This approach requires adding a
CustomerID column to all of your tables to determine which data belongs to what customer. Moreover, whenever you query the
database to display data in a web page you need to make sure to filter the data by the
CustomerID of the currently logged on user. Another option is to use a
separate database for each customer. This approach lessens the risk of a user somehow seeing or modifying data for a different customer and this level of isolation may be
necessary for applications that store sensitive data, such as health care and financial applications. The hospital software SaaS application I work on uses such a separate
database data architecture.
One downside to using separate databases is that it is harder to run a query against all customer data. For example, if you find an anomaly in the data for one customer - say, a patient record that has a street address specified but no city specified - it can be difficult to quickly determine whether the same anomaly exists in other databases. This article discusses different ways to query multiple databases and shows a simple, web-based tool I've created and routinely use for those SaaS-style web applications I work on that use separate databases to store different customer data. Read on to learn more!
The Motivation Behind Using Separate Databases
Before we look at ways to query multiple databases, it behooves us to take a moment and discuss why one might choose to have a SaaS-style application use a data architecture where each customer's data is stored in a separate database. Such a data architecture certainly has its disadvantages:
- Rolling out changes to the data model is more difficult because changes have to be applied to multiple databases. If you've added new functionality that requires changes to the data model - a new table, a new column in an existing table, new stored procedures, etc. - then when rolling out these changes you need to be certain that these modifications are applied to all databases. This is not terribly complex if you have a disciplined, systematic, tested, and automated approach to rolling out changes, but can lead to bugs and incomplete data models for certain customers if you are doing this manually.
- Adding a new customer requires creating a new database. Anytime you add a new customer you need to create a new database that has the baseline database objects - the requisite tables, stored procedures, views, and so on. Likewise, any baseline data, such as the data in lookup tables, must be populated into this new database.
- Querying across multiple customer data can be challenging.
The main benefit of using a separate database for each customer is data isolation. Having each customer's data in a separate database virtually eliminates the possibility that one customer may be able to inadvertently see or modify the data of another customer due to poor programming practices or a bug in the application. This level of isolation is attractive when storing sensitive data, such as medical records or financial information. For a much more in-depth look at the different data architecture options for SaaS applications, as well as tips to determine the best architecture for your scenario, refer to Ruminations on Multi-Tenant Data Architectures.
As aforementioned, one of the challenges of using a separate database for each customer is querying data across customers. The remainder of this article examines two ways
to facilitate such queries. First, we'll look at using
sp_MsForEachDb, an undocumented stored procedure in Microsoft SQL Server. Following that, we'll see how
to build a web page that serves as a more attractive and functional user interface to the
sp_MsForEachDb stored procedure.
Querying All Databases With
Microsoft SQL Server includes a number of undocumented stored procedures. One of these is
sp_MsForEachDb, which runs a specified query against all databases on the database server. The syntax follows:
You can put a question mark (
?) within the command to execute on all databases string to inject the name of the database the command is currently
being executed on. For example, to see how many active patients exist in each database you could run the following query:
Note that the question mark is used twice in the above command: to return the database name in the select list and to indicate the
Patients table to query.
If you omit the question mark from the table name (i.e., if you use
FROM Patients rather than
then you will get back a resultset
The above statement might return results that look like the following:
Keep in mind that
sp_MsForEachDb returns a resultset for each database. In other words, the results above would not be returned as rows in a single resultset,
but rather each would return a single row in a separate resultset. To put it another way, if you were to run the above query in SQL Server Management Studio you would see four separate "grids" in
the output rather than a single grid with four rows.
sp_MsForEachDb offers a quick and dirty way to run the same query across all databases, it has two major drawbacks:
- To run
sp_MsForEachDbyou must connect to the database and type in your query, which usually means that you must have an account on the database server and be able to connect to the database using a program like SQL Server Management Studio. This may not be possible from a remote location, such as when visiting a client's site.
sp_MsForEachDbexecutes the specified command against all databases, including built-in databases like
model. There's no way to query just a subset of databases.
sp_MsForEachDb. In a nutshell, this page allows me to type in a query, select the databases to run it against, and see the results in a grid. The screen shot below shows this page in action.
The remainder of this article looks at how you can build such a page yourself.
Determining The Customer Databases
sp_MsForEachDb, the web page I've created allows the user to choose the specific set of databases to issue the query against. As the screen shot above shows, the customer databases are listed using a CheckBoxList and the query is dispatched to the checked databases. But how do I know what databases to list in this CheckBoxList? One approach would be to hard code the databases into the CheckBoxList, but that would require revisiting this page each time a new customer is added or whenever a customer cancels their service. Instead, we can write a query that will return the names of all databases that have a particular table:
This query uses
sp_MSForEachDb to query each databases
sysobjects table. The
sysobjects table contains a record for every object
in the database, including tables. The above query says, "If the current database has a table named CommonTableName then return the name of the database."
Here, CommonTableName would be the name of one of the tables in the data model that is shared among all customer databases but is not found in those databases
that you want to omit (such as
The following method shows how to work with the results of
sp_MSForEachDb in your .NET application. Typically, when you issue a command to the database
you can get back a DataReader and walk through each row in the resultset using the DataReader's
Read method. However, because the above query uses
sp_MSForEachDb it does not return a single resultset with multiple rows, but rather a series of resultsets, each with one row that has one column, the database
name. You can still get the data back via a DataReader object, but to step through each resultset you need to use the DataReader's
NextResult method, as the
following code illustrates:
Do...Loop in the code above starts with the current resultset and reads in the first row via
myReader.Read(). If a row was read in it
adds this row to the
dbList list. At the end of the loop the next resultset is loaded using
myReader.NextResult(). If there is no more resultsets
NextResult returns False and the loop terminates.
The CheckBoxList in the ASP.NET page is populated by assigning the list of customer databases to the CheckBoxList's
DataSource property and then calling
Issuing the Query to the Selected Customer Databases and Displaying the Results
Once the user has selected which databases to query, typed in their query, and clicked the "Execute Query" button, we need to dispatch the query to each of the selected databases and display the results. The following code shows the "Execute Query" Button's
Clickevent handler. Note that each item in the CheckBoxList is enumerated and for each selected item the query is issued to the database and the results are returned in the form of a DataTable.
To issue the query to a database you need to be able to form a connection string that references it. How you do this depends on your implementation. I've got a method called
ConstructConnectionString(databaseName) that will generate a connection string to the specified database, and the result of this method is passed into
SqlConnection object's constructor.
After the query is issued and the results are stored in the
results DataTable, the
DisplayResults method is called, passing in the DataTable and the name of the
DisplayResults method is responsible for displaying the current database's results on the page. The results are displayed in a Table Web control
tblResults, whose rows and columns are programmatically added based on the data in the
The following methods -
show how to construct the rows in the table for a particular set of results from a particular database. I've removed some formatting-related code in these methods for brevity.
And... that's it! The above code issues the query to all of the selected databases and builds up the contents of a Table Web control based on the returned results.
This web page provides a cleaner, easier to use interface for querying multiple databases than
sp_MsForEachDb. With this page you don't need to worry about
typing in the question mark character or having your query run against all databases, including non-customer databases. As long as you have access to this web page you can
simply enter a query, select the databases, and you're off and running.
One of the advantages of having a separate database for each customer is data isolation - there's a much lesser chance that data will "leak" across customer boundaries, or that a buggy page or SQL statement will affect all customers. While having a web page that allows administrators to run a query against multiple databases doesn't remove this isolation or its benefits, it does lessen the boundary imposed by the separate databases. If this web page is not properly locked down, non-administrative users may be able to gain access and run queries - including INSERT, UPDATE, and DELETE queries - against all customer databases. And even if this page is configured to only permit administrators, it is possible that an administrator might accidentally (or willfully) run a query that damages the data in multiple databases.
To help mitigate these problems, I suggest only allowing those administrators who have unfettered access to the database to be able to access this page. And for a bit of
extra assurance you can add a few lines of code in the "Execute Query" Button's
Click event handler to ensure that no
DELETE queries are executed.