Creating PDF Documents with ASP.NET and iTextSharp
By Scott Mitchell
The Portable Document Format (PDF) is a popular file format for documents. Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, an eCommerce store may offer a "printable receipt" option that, when selected, displays a PDF file within the browser. Last week's article, Filling in PDF Forms with ASP.NET and iTextSharp, looked at how to work with a special kind of PDF document, namely one that has one or more fields defined. A PDF document can contain various types of user interface elements, which are referred to as fields. For instance, there is a text field, a checkbox field, a combobox field, and more. Typically, the person viewing the PDF on her computer interacts with the document's fields; however, it is possible to enumerate and fill a PDF's fields programmatically, as we saw in last week's article.
This article continues our investigation into iTextSharp, a .NET open source library for PDF generation, showing how to use iTextSharp to create PDF documents from scratch. We start with an example of how to programmatically define and piece together paragraphs, tables, and images into a single PDF file. Following that, we explore how to use iTextSharp's built-in capabilities to convert HTML into PDF. Read on to learn more!
Getting Started with iTextSharp
There are a variety of .NET libraries available to programmatically create PDF documents. Perhaps the most popular is iTextSharp, which is the .NET version of the Java-based iText PDF library.
Part of iTextSharp's popularity stems from the fact that it's open source. However, it's important to keep in mind that starting with version 5.0, iTextSharp is released under the GNU Affero General Public License (AGPL) version 3. This license requires that any application that uses iTextSharp must also be released under the same license and that you must make your application's source code freely available (like iTextSharp's is). You can optionally buy a license to be released from the AGPL. While version 5.0 (and beyond) is released under the more restrictive AGPL, previous versions were released under the GNU Lesser General Public License (LGPL), which allows the use of iTextSharp within an application without requiring that the application also be released under the LGPL. In other words, by using version 4 or earlier you can use iTextSharp in your web application without having to buy a license and without having to release your web application's source code. (The download available at the end of the article uses iTextSharp version 4.1.6.)
You can download iTextSharp from its project page at: http://sourceforge.net/projects/itextsharp/.
(Alternatively, you can download the code at the end of this article, which includes the iTextSharp version 4.1.6 assembly in the
itextsharp.dll.) For assistance with iTextSharp, I suggest the iText-question
Creating a PDF Document from the Ground Up
Creating a PDF document from the ground up using iTextSharp involves the following steps:
- Create a
Documentobject, which models the PDF document you are creating.
- Create a
PdfWriterobject, which is the bridge between the
Documentobject and a backing store. In other words, the
PdfWriterobject is responsible for serializing the PDF document you create to some store, such as in memory or to disk.
- Add various elements to the
Documentobject - paragraphs, tables, images, and so on.
Steps 1 and 2: Creating the
Before we get bogged down in the details of Step 3, let's first take a moment to examine the code necessary to accomplish Steps 1 and 2:
The first line of code creates a
Document object specifying the document's dimensions and left, right, top, and bottom margins, respectively.
Next, we create a
PdfWriter object. In doing so we need to specify two bits of information - the
Document object being created and a
Document object's output should be serialized when it is closed. In the code above we are using a
FileStream, which will cause the
PDF document's contents to be serialized to a file on disk named
Following that the document object is opened. At this point we're ready for Step 3 - adding the assorted elements to the document. Once all of the elements have been
added we close the document, which prompts the
PdfWriter object to "save" the
Document object to the specified
Stream - in this
case, to the file
Step 3: Adding Elements to the Document
When creating a PDF document you can add a number of different element types, including: annotations, chunks, tables, lists, images, and paragraphs. There are classes in the iTextSharp library that model these various element types. To add an element type to the document you (typically) create an instance of the appropriate element type, set some properties, and then add it to the
Documentobject via the
Addmethod. For example, the following code snippet adds a new
Paragraphobject to the document with the text, "Hello, World!"
In fact, if we run the above code (namely, the code snippet presented in Steps 1 and 2: Creating the
PdfWriter Objects with
the code snippet above) we get a PDF named
MyFirstPDF.pdf that contains the text, "Hello, World!", as the screen shot below show.
For a good primer on adding common elements to a PDF document I recommend Mike Brind's excellent series of articles on iTextSharp: Create PDFs in ASP.NET. There are individual articles on fonts, adding text, working with tables, and adding images, among others.
Putting It All Together: Dynamically Creating a Receipt PDF
The demo available for download at the end of this article includes a web page named
CreatePDFFromScratch.aspxthat builds up a PDF receipt. The page contains user interface elements where the user can enter the Order number, price, and what items were ordered, and these selections are used to dynamically create the PDF receipt. Of course, in a real-world application this information would be pulled from a database and not hand-entered by a user.
The screen shot below shows the
CreatePDFFromScratch.aspx user interface. Here, we are creating a receipt for Order 1234, which cost $55.95
and contained four widgets, one whatchacallit, and seven thingamabops. Clicking the "Create Receipt" button causes a postback and on postback a PDF is generated.
The code that runs when the "Create Receipt" button is clicked a bit long to post in its entirety, so instead let me post just the germane portions, starting with Steps 1 and 2:
The above code snippet is quite familiar to the code snippet examined back in Steps 1 and 2: Creating the
with one important difference - in the earlier example the created PDF was serialized to a file. Here, we are serializing the PDF to a
The reason is because rather than saving the PDF to the web server's file system, we simply want to send the PDF back to the browser, where the user can open or save it.
We'll see how this is done in a moment.
Before adding any elements to the document a number of
Font objects are created, which specify the font family, font size, and style for the receipt title,
its subtitles, and so on.
Next, the receipt title is added. Note that when creating a new
Paragraph object we can optionally specify its font. In this case, we use the
which will display the receipt title in an 18pt Arial bold font.
The order details are defined using a table, which is accomplished by using the
PdfPTable class. Here we specify that the table has two columns.
Next we specify various table properties - it's HorizontalAlignment, how much spacing should appear before and after the table, and any default cell settings (in this
case, we indicate that cells, by default, should have no border).
Next, the cells are added to the table, one at a time, from top left to bottom right. The code below creates a 2x2 table that displays the order ID and total price.
After the table is constructed it is added to the document object via the
There's another table in the receipt that shows the items ordered, but I'll skip that code since it is nearly identical to the order details table code. There's
also an ending message at the bottom of the receipt - "Thank you for your business..." - which is added via a
The receipt also contains an image. This image -
4guysfromrolla.gif - is located on the web server's file system in the
It gets added to the PDF receipt by creating a new
Image object. If you add the
Image object to the document like the other text elements it
will appear in the document based on the order it was added. However, you can specify an absolute position for the image, which I do here, to locate it in the upper
right corner of the receipt.
After all of the document content has been added, the
Document object is closed and the PDF is streamed back to the visitor's browser. This is accomplished
by the following lines of code:
Response.ContentType property tells the browser the type of content it is being sent from the server.
application/pdf is the standard MIME
type for PDF documents; this notifies the browser that it is receiving a PDF document. The next line of code adds the
Content-Disposition HTTP header to the
response. This tells the browser to treat the content like an attachment, meaning the user will be prompted whether to open or save the PDF (rather than having it open
directly in the browser window). Note that we can tell the browser the name of the file being sent, which the browser will use as the suggested name should the user
opt to save the PDF to their hard drive. Here we use the filename
Response.BinaryWrite statement sends the contents of a specified byte array to the browser. Recall that
output is the
object we created when instantiating the
output.ToArray() returns the contents of the
MemoryStream - namely, the
binary contents of the generated PDF document - as a byte array, which is then sent down to the client.
The screen shot below shows the receipt PDF generated when using the inputs shown in the previous screen shot (namely, an Order ID of 1243, a total price of $55.95, and so on).
Creating a PDF Document from a String of HTML
iTextSharp includes a simple HTML parser class that can be used to translate HTML into a PDF document. Using this class you can, with just a few lines of code, turn an HTML document into a PDF file. For example, rather than building the receipt programmatically, adding each element one at a time as we did in the previous demo, we could instead opt to generate the receipt using an HTML template.
The demo available for download includes an HTML template file named
Receipt.htm, which is located in the
~/HTMLTemplate folder. This HTML file
contains the following markup (note - some markup has been removed for brevity):
This markup defines a receipt layout not unlike the receipt created programmatically in the previous section. There's the "Northwind Traders Receipt" title at the top,
here implemented as an
<h1> element. There's a table for the order details, a "Thank you for your business..." message at the bottom, and so on.
Note that the above markup contains four placeholders - text surrounded by brackets. The idea here is that before we ask iTextSharp to turn the above markup into a PDF we will first replace those placeholders with the Order ID, total price, and other metrics for the order we are generating a receipt for.
Turning HTML into a PDF involves the following steps:
- Create a
- Create a
- Read in the HTML as a string.
- Call iTextSharp's
HTMLWorker.ParseToListmethod, passing in the HTML to convert into PDF. This returns a collection of elements.
- Add each element returned in Step 3 to the
Receipt.htmtemplate. Following that, we need to replace the placeholders with the appropriate values. In the demo available for download you'll find a page named
ConvertHTMLtoPDF.aspx, which has the same user interface as
CreatePDFFromScratch.aspx. In short, the page prompts the user to enter an Order ID, total price, and select what items were part of the order. These user-supplied values are what are used to populate the placeholders in
These two sub-steps - reading the contents of
Receipt.htm into a string and then replacing the placeholders - are accomplished by the code snippet below:
The above code snippet does not include the code that sets the [ITEMS] placeholder, which is where the order details are displayed.
The code is a little lengthy, but it's not terribly complex. The code simply builds the markup for a
<table> by looping through the CheckBoxList and
adding a table row (
<tr>) for each selected purchased item.
Once the HTML string has been composed we are ready for Steps 4 and 5. Step 4 - calling iTextSharp's
HTMLWorker.ParseToList method - parses the HTML string
and returns a collection of elements. Step 5 enumerates this collection of elements, adding them to the
That's all there is to it! Keep in mind that the HTML parser is simply converting HTML into elements that can be added to the PDF document. In addition to adding these
parsed elements you can also add elements you create, just like we did in our earlier demo (
CreatePDFFromScratch.aspx). For instance, we can add the
logo to the upper right corner of the receipt using the same code as before:
The generated PDF, shown below, is quite similar to the receipt created from the ground up.
If you decide to use iTextSharp's HTML to PDF capabilities, keep in mind that they are pretty rudimentary. iTextSharp will not correctly parse a complex HTML document with many layers and overlays and it's stylesheet support is limited (although I hear this has been improved upon in iTextSharp 5.0). For maximum control you will want to either create PDFs from the ground up using the techniques discussed at the start of this article or you will want to create your PDFs using Adobe Acrobat with form fields to fill in the dynamic bits, as was discussed in Filling in PDF Forms with ASP.NET and iTextSharp.