Filling in PDF Forms with ASP.NET and iTextSharp
By Scott Mitchell
The Portable Document Format (PDF) is a popular file format for documents. PDF files are a popular document format for two primary reasons: first, because the PDF standard is an open standard, there are many vendors that provide PDF readers across virtually all operating systems, and many proprietary programs, such as Microsoft Word, include a "Save as PDF" option. Consequently, PDFs serve as a sort of common currency of exchange. A person writing a document using Microsoft Word for Windows can save the document as a PDF, which can then be read by others whether or not they are using Windows and whether or not they have Microsoft Word installed. Second, PDF files are self-contained. Each PDF file includes its complete text, fonts, images, input fields, and other content. This means that even complicated documents with many images, an intricate layout, and with user interface elements like textboxes and checkboxes can be encapsulated in a single PDF file.
Due to their ubiquity and layout capabilities, it's not uncommon for a websites to use PDF technology. For example, when purchasing goods at an online store you may be offered the ability to download an invoice as a PDF file. PDFs also support form fields, which are user interface elements like textboxes, checkboxes, comboboxes, and the like. These form fields can be entered by a user viewing the PDF or, with a bit of code, they can be entered programmatically.
This article is the first in a multi-part series that examines how to programmatically work with PDF files from an ASP.NET application using iTextSharp, a .NET open source library for PDF generation. This installment shows how to use iTextSharp to open an existing PDF document with form fields, fill those form fields with user-supplied values, and then save the combined output to a new PDF file. Read on to learn more!
An Overview of the Demo Application
This article shows how to use iTextSharp to programmatically populate the form fields in an PDF document. To facilitate this discussion I created a demo available for download at the end of this article that shows how to programmatically populate the form fields in the IRS's Form W-9. (The IRS is the Internal Revenue Service for the United States, which is charged with collecting taxes. Form W-9 is used to provide taxpayer information to a requesting person or business.)
In particular, the demo includes a web page named
CreateW9.aspx that has a number of textboxes, radio buttons, and other input elements that prompt the user
to provide taxpayer identification information. The screen shot below shows the
CreateW9.aspx page when viewed through a browser.
After entering the information and clicking the "Generate Completed W-9" button (not shown in the above screen shot), there is a postback. On postback the
code-behind class uses the iTextSharp library to generate a Form W-9 PDF document whose form fields contain the text entered by the user. The original Form W-9 PDF
document that contains the form fields is stored on the web server in the
PDFTemplates folder. (This PDF file,
downloaded from the IRS's website.) When the user opts to generate a W-9, iTextSharp creates a new PDF
document that takes the original Form W-9 PDF and populates its form fields with the user-supplied values. This new PDF document is not saved on the web server; rather,
it is streamed back directly to the client's browser, prompting them to save or open it.
The screen shot of the PDF below shows the PDF generated by
CreateW9.aspx based on the user's inputs.
While this article demonstrates filling form fields on an IRS-supplied PDF document, there's no reason why this technique could not be used to fill the forms on a PDF generated by you or your company. If your company has invoices, NDAs, or other forms that need to commonly be filled out based on user input or data residing in a database, you could create those PDFs to contain form fields and then write code to populate them (based either on user input and/or the results of a database query). To create your own PDFs graphically you will need to buy a copy of Adobe Acrobat. Future installments will explore how to create PDFs programmatically using iTextSharp.
Getting Started with iTextSharp
There are a variety of .NET libraries available to programmatically create PDF documents. Perhaps the most popular is iTextSharp, which is the .NET version of the Java-based iText PDF library.
Part of iTextSharp's popularity stems from the fact that it's open source. However, it's important to keep in mind that starting with version 5.0, iTextSharp is released under the GNU Affero General Public License (AGPL) version 3. This license requires that any application that uses iTextSharp must also be released under the same license and that you must make your application's source code freely available (like iTextSharp's is). You can optionally buy a license to be released from the AGPL. While version 5.0 (and beyond) is released under the more restrictive AGPL, previous versions were released under the GNU Lesser General Public License (LGPL), which allows the use of iTextSharp within an application without requiring that the application also be released under the LGPL. In other words, by using version 4 or earlier you can use iTextSharp in your web application without having to buy a license and without having to release your web application's source code. (The download available at the end of the article uses iTextSharp version 4.1.6.)
You can download iTextSharp from its project page at: http://sourceforge.net/projects/itextsharp/.
(Alternatively, you can download the code at the end of this article, which includes the iTextSharp version 4.1.6 assembly in the
itextsharp.dll.) For assistance with iTextSharp, I suggest the iText-question
Determining Form Fields Details
In a moment we'll talk about how to use iTextSharp to take a user-supplied value and stick in in a form field. In a nutshell, it involves one line of code:
The above code assigns value to the form field named fieldName.
In order to populate the form fields in a PDF document we need to know the names of the fields. If you are the one who created the PDF document then you already know the field names, but what if you were given the PDF document or downloaded it (like how I downloaded the Form W-9)? If you have Adobe Acrobat on your computer then you are in luck - you can open the PDF in Adobe Acrobat and view the properties for each form field, which includes its name.
If all you have is the free Adobe Reader then things are a bit more challenging, as Adobe Reader does not provide details about the form field elements.
However, using iTextSharp we can write a bit of code to get all of the fields and display their information on a web page. The demo includes a page named
ListFormFields.aspx that has two options:
- Show Form Fields - displays the name and type of each field in the PDF document as a numbered list. For checkbox fields, the Export Value is also displayed, which is the value you need to set the field to in order to check the checkbox. (More on this later!)
- Generate Sample PDF - generates a PDF with the form fields filled in with the values 1, 2, .... When used with the Show Form Fields options you can match up the number of the form field listed on the web page with the form field number in the generated PDF to see what field name corresponds to what field position in the PDF.
ListFormFields.aspxwhen using the Show Form Fields for the Form W-9 PDF (
fw9.pf). For Form W-9, the form field names are a bit lengthy - for instance, the Name text field is named
topmostSubform.Page1.f1_01_0_- but this output clearly shows the name of each form field on the document along with its type (TextField or CheckBox). Also, for the checkboxes the Export Value is reported.
The Generate Sample PDF option creates a PDF with each form field value populated with its corresponding index value. The PDF screen shot below shows that the
Name text field has a value of 1, whereas the Business name field has a value of 21. This indicates that the Name text field's name is
topmostSubform.Page1.f1_01_0_ (the first item displayed in the Show Form Fields list) whereas the Business name text field's name is
topmostSubform.Page1.f1_02_0_ (the 21st item displayed in the Show Form Fields list).
ListFormFields.aspx page to determine a PDF's form fields is doable, but is a bit of a hassle. If you routinely work with PDF files you'll find the
form field tools in Adobe Acrobat to be well worth the purchase price.
Generating a New PDF with Form Field Values Filled Programmatically
iTextSharp makes it easy to fill the form fields in an existing PDF, creating a new PDF in the process. Start by creating a new
PdfReaderclass is one of many classes provided by the iTextSharp library. This class, along with the others used in this demo, are found in the
iTextSharp.text.pdfnamespace, so add an appropriate
Importsstatement to the top of your class file.)
pdfPath is the full physical path to the PDF file that contains the form fields. Recall that in the demo the Form W-9 file is stored in the
~/PDFTemplates folder and is named
fw9.pdf. Therefore, we'd use a pdfPath value of
Next, we need to create a
PdfStamper object. The
PdfStamper is used to populate the form fields in a PDF document and generates a new PDF.
This new PDF is saved to a stream that you must specify when creating the
PdfStamper object. In the demo we are interested in streaming the generated PDF
back to the user - there's no need to save it so we have the generated PDF outputted to a
MemoryStream. To save the generated PDF to disk, you could have
PdfStamper output to a
PdfStamper object created we're now ready to assign values to the PDF's form fields. The
PdfStamper object has a
AcroFields that returns an
AcroFields object. This object has a
SetField method, which is used to assign a value
to a particular field in the PDF.
For example, to assign the value "Scott Mitchell" to the Name form field (which has a name of
topmostSubform.Page1.f1_01_0_) and the
value "N/A" to the List account number(s) here field (which has a name of
topmostSubform.Page1.f1_07_0_) we would use the following code:
Things are a little more complicated with checkboxes. To check a checkbox you need to call the
SetField method and pass in the checkbox's Export Value
as the value. For example, to check the Individual/sole proprietor checkbox, which has a name of
topmostSubform.Page1.c1_01 and an
Export Value of 1, and the Exempt payee checkbox, which has a name of
topmostSubform.Page1.c1_01 and an
Export Value of 8, we'd use the following code:
Note how we pass in "1" to check the Individual/sole proprietor checkbox and "8" to check the Exempt payee checkbox. The precise value passed in to check a checkbox depends on the checkbox field's Export Value - there is no hard and fast standard, unfortunately. Therefore, to check a checkbox you must know its Export Value.
Once the form fields have been populated we need to close the
PdfReader objects. You can optionally indicate whether the
generated PDF document's form fields should still be editable by setting the
FormFlattening property. Setting this property
true indicates that the form fields should no longer be editable in the generated document.
PdfStamper has been closed the stream specified when instantiating the
PdfStamper object contains the generated PDF. If you used
FileStream object then that means the generated PDF now exists on disk. In the demo I use a
MemoryStream, which means the generated PDF now
resides in memory. At this point we're ready to send it back to the browser for display. This is accomplished using the following code:
The first line of code adds the
Content-Disposition HTTP header. This tells the browser to treat the content like an attachment, meaning the user will be
prompted whether to open or save the PDF (rather than having it open directly in the browser window). The second line of code tells the browser the type of content it
is being sent.
application/pdf is the standard MIME type for PDF documents; this notifies the browser that it is receiving a PDF document.
Response.BinaryWrite statement writes back the contents of a specified
byte array. Recall that
output is the
we created earlier.
output.ToArray() returns the contents of the
MemoryStream - namely, the binary contents of the generated PDF document - as a byte
array, which is then sent down to the client.
|A Note About the Demo Code...|
The code presented in the text of this article is a simplified version of the code in the demo available for download. The demo encapsulates much of the above
functionality in a series of classes in the |
The code presented in this article's text exists (verbatim) in the demo, but not all in one spot like the text above implies. I think the structure of the code in the demo is more reusable and easy to understand once you examine it, but I wanted to point this out in case you go to the demo and get confused because you cannot find the exact code snippet presented in the text above.
Conclusion... And Looking Forward
This article (and demo) showed how to use ASP.NET and iTextSharp to programmatically fill the form fields in a PDF. In particular, we saw how to populate the form fields of the IRS's Form W-9 PDF file. This article is just the first in a series of articles that explore using iTextSharp to work with PDF documents in an ASP.NET application. Future installments will detail how to programmatically create PDFs, among other topics.