Nprof and ACT, Your Performance Assistants, Part 3By Klaus Salchner
In Part 2 of this article we looked at how to use ACT and NProf to collect metrics vital to determining performance information about an application. In this third and final part we'll examine how to dissect the profiling data collected by NProf.
Analyzing the Profiling Data
After running the NProf test, we are ready to analyze the recorded data. In the upper left corner you see a "Thread" drop-down list. It shows you all the threads which were created and executed in your application. Each thread executes different objects/methods. So select first the thread you want to look at. The tree on the left side shows for the selected thread the namespace hierarchy with all its types which got instantiated and executed. So every single type ever instantiated on that thread is shown in the tree view. Each namespace and type has a checkbox, which you can use to filter the list of object/methods calls. In our example (see screenshot above) we see three namespaces -
RetrieveSecurity- and we unchecked the checkbox in front of
Microsoftso we see only calls belonging to our objects/methods (which are in the
RetrieveSecuritynamespace, which contains the classes for my ASP.NET application. Make sure you uncheck the top-checkbox called "All namespaces" otherwise it will still continue showing all object/method calls.
The top right pane shows you a list of all object/method calls (filtered based on the checked checkboxes in the tree view). The other five columns show you the execution profile of each method:
- # of Calls – Shows you how often this object/method has been called. In our sample we see that the method
RetrieveSecurity::btnLogon_Click()has been called two times.
- % of Total – Shows in percentage how much of the total execution time has been spent in this method. Please note
that the total execution time is the sum of the execution time of all threads together. You don't see absolute execution
times but rather relative percentage values because of the overhead of the profiling API. In our specific example you
see that the first method (which is
RetrieveSecurity.Main()) takes 98.21% of the total execution time. If a method calls other methods, then it also includes the execution time of the child method calls. When you select the method in the top right pane you see in the bottom right pane the list of methods called by it (more to it later on).
- % in Method – This shows the percentage of the total execution time spent in the method itself. This metric
excludes all the child method calls. Sorting by this column and then concentrating on the ones with high percentage numbers
gives you a good starting point for improving your application's performance.
In our sample
RetrieveSecurity.Main()spends only 0.01% of the total execution time. So it is safe to assume that this method is very fast.
- % in Children – This is the total execution time spent in child methods called by the method. Methods with high
percentage values in this column are another good target to look at, as these methods call lots of other methods which
might be calling lots of other methods, etc. In our example
RetrieveSecurity.Main()spends 98.21% in the child methods. For
Main()this is expected because this is the starting point of our application which ultimately causes most other methods to be called.
- %Suspended – This value indicates for how long the thread was in suspended mode. You expect that to be zero for almost every method. Look at any method with has a value other then zero.
When selecting a method in the top right pane the bottom right pane shows all of the children methods called. That gives you a lot of insight what is happening in that method. The bottom right list shows the following information beside the name of the method:
- # of Calls – How often has the method been called. This means how often the method has been called in total. Not just within the selected parent method, but overall throughout your application.
- % of Total – How much execution time has been spent in this method. This includes not just calls within the parent method but overall all calls of this method throughout your application.
- % of Parent – The execution time the child method spends of the parents method total execution time. So if the parent method calls two other methods, then the "% of Parent" for the two child methods plus the "% in Method" of the parent method should always come to a 100% (keep in mind because of rounding errors it might not be exactly a 100%). So this column with the "% in Method" of the parent method tells you exactly where all the execution method of the parent method is spent.
You can sort both lists by clicking on the column headers. Spend some time with the data provided. It really shows you very well were all your execution time is spent, whether that is overall for your application or just for a specific method. NProf is an excellent tool for finding those objects/methods in your application that are acting as a bottleneck, and gumming up the overall application's performance. To start, look at the methods where most of the execution time is spent. Dive into the method's code to understand why this method is taking the most time. It doesn't necessarily mean there is something wrong, but you should at least justify the inordinate amount of time a method takes. I've also found that most of the time I go back to re-examine source code I end up finding some way to optimize the code and improve the application's performance.
One word or caution: just because you have discovered some technique to optimize your code, don't rush off and implement that optimization. First, justify the optimization's value and cost. You can optimize your code forever, but there is a point of diminishing return of value, meaning you spent too much time optimizing the code for too little performance gain. In this case it is time to stop optimizing and move on with other work.
A Real-World Example
Let's assume you have an application which needs to make lots of string concatenation. We know that strings are immutable, so every time you concatenate two strings the framework has to create a new string and then copy the other two strings into it. With small numbers of string concatenations this is not an issue. But if you have many, then this puts pressure on your garbage collector (GC) which means it needs to clean up memory more frequent which has then a negative impact on performance.
Let's look at the following sample – which just simulates a big number of string concatenations (the full sample shows a form and the user can click on the form which ten evokes the event handler below):
This small sample performs about 100,000 string concatenations. Overall, NProf shows that
takes about 40% of the total execution time. This is a lot. We can optimize the code, however, by
using the recommended technique for concatenating string - using the
StringBuilder class. First, we make sure
the initial size of the
StringBuilder is large enough for the complete string to build, which avoids any
pressure on the GC. Here is the updated sample code:
This time NProf shows a total execution time of 23%. So a simple change to a method saved 17% of the overall execution time, which is a big win for a minor code change.
Application Center Test and nprof are two free tools which can help you a lot to identify performance issues in your application, track down the actual bottleneck and resolve it. It is well know the later you find issues in the life cycle of your application, the costlier it is, with the worst case being you find it at the customer's site and it impacts the customer's business. Performance bottlenecks are the worst type of issues which sometimes result in significant changes to your application code. The more time you spend upfront in performance the better. That starts from defining the requirement, doing the design and architecture, regular performance tests of your code (while you are writing it) and doing a final performance test when application is complete. It is good practice to provide a performance report with each release of your application. Your customer-facing coworkers will appreciate it as it can be an easy way answering all the questions the customer IT folks have.
One final note for database driven applications. Make sure you understand from your LOB or customer how much data is expected in the database. If your customer expects to load a million records but you design, code and test against a thousand records, then you have a big disconnect. Load your database with the same amount of data you expect your customers will. And do that already with the database you use in your development. This way you see already when you write your code if it will perform slow or not. If you have any comments or questions in regard to this article, please contact me at firstname.lastname@example.org.
About the Author
Klaus Salchner has worked for 14 years in the industry, nine years in Europe and another five years in North America. As a Senior Enterprise Architect with solid experience in enterprise software development, Klaus spends considerable time on performance, scalability, availability, maintainability, globalization/localization and security. The projects he has been involved in are used by more than a million users in 50 countries on three continents.
Klaus calls Vancouver, British Columbia his home at the moment. His next big goal is doing the New York marathon in 2005. Klaus is interested in guest speaking opportunities or as an author for .NET magazines or Web sites. He can be contacted at email@example.com.