CMSC 838B: Information Visualization

Tracing an Application's Execution Profile:
Interactive Visualization of the Profiled Data for TreeMaps with SpotFire

February 28, 2001

Narendar Shankar
narendar@cs.umd.edu

Introduction

Most applications written in languages like Java, or written using visual development tools like Visual C++ involve a large number of calls to the language libraries. Infact for a huge application there might be more calls to the library , than the actual number of calls to user written functions. Under normal circumstances the program might work fine, but when subject to various types of input, some abnormal behaviour might occur and it becomes very difficult to track down the exact point in the code where the problem is being manifested .

A profiler is a tool which traces the amount of time and resources an application takes up, but generates a lot of data for an application which involves many function calls . It becomes very difficult to trace the profile of a large application and determine the exact points where the application is misbehaving. Visualization of the profile for an application would give us a better view of where the problem is and would help us zero in on the bad part of the code. Treemaps is one such application which performs well when the input data is small but as size increases the performance decreases. Treemaps can be profiles using a java profiler but the profiler generates anywhere between 10000 and 20000 lines of profiled data . This includes both the calls to the java library and the user function calls. Visualization of the applications profile would greatly help in identifying the exact points in the application where misbehaves.

Description of Data

Generating data was the toughest part of the entire experiment. The actual data which was fed into treemaps were the email archives of a person. The size of the input was increased in steps of 1000 starting from 1000 and ending with 7000. For each run the application was profiled using the java profiler. The profiled data was then fed into excel to generate a more readable format for spotfire. Finally seven excel files were generated , each one corresponding to the data sets between values 1000 and 7000 which were provided as input to Spotfire. Each excel file had about 12,000 lines corresponding to the number of function calls made in the entire application execution sequence.

Method

Spotfire was the tool used for visualising the data. The seven excel files which were generated in the above prodecure were fed into spotfire. The columns generated were the caller, callee, the time taken for a function and the number of times a function was being called. Also another column which reflected the product of the time taken and the number of times a function was being called was estimated . Mathematically representing it represents the total execution time spent on a particular function. The intuition behind this function becomes clear when we have a look at the visualizations later.
Product=Total execution time= Count * Time spent in function.
Those function calls which have the highest values for execution time have the highest values of total execution time.All normal data is shown by red colored spots and the abnormal data is colored differently. Apart from that many interesting features of spotfire like 'dithering' and 'size coding' have been incorporated. The plots are all shown below and the significance of each result is explained.

Observations

Figure 1. shows a raw output of a java profiler. It is approximately 12,000 lines of raw text which is vey difficult to decode and explain the relationhips between the callers and the callees.What is required here is a simple visualisation of which caller callee relationship has the maximum total time of execution.

Figure 2.Plot in two dimensions showing the function with maximum number of calls. As this figure shows there are odd function calls which are executed a large number of times. We are now on step 1 of our trail. We know that our culprit function calls are called many times. Let us mark the odd ones with different colours. Another beauty of this picture is the fact that "dithering" has been used to differentiate between the various values of count which have almost the same value

Figure 3 .Plot in two dimensions showing the function with maximum execution time for a single call . As this figure shows there are many java library calls which take a large amount of time for the execution of a single function call. We are now on step 2 of our trail. We know that our culprit function calls do not take too much time for the execution of a single call. But here the usefulness of spotfire is highlited because we have our culprits already marked in the last stage. This is precisely the advantage of using spotfire . It helps us find the "needle in the haystack".

Figure 4. This figure has also got some amount of dithering and another feature called "size coding" has been incorporated. This is used to highlight the marked members with a bigger size.

Figure 5 .Plot in two dimensions showing the relationships between callers and callees. The information in this plot is very cluttered . In the next figure (Figure 5 ), the slider is manipulated and we get the parts of the plot which are of interest to us. This brings us to step 3 of our trail.

Figure 6 .Plot in two dimensions showing the relationships between callers and callees. The information in this plot is very clear and shows the exact spot where the relationship between the caller and the callee, which is causing the problem . For this once again we use size coding and dithering to make it appear in a better fashion

Figure 7. We now return back to the intuition which we stated in the data definition. Looking at the above visualizations we can safely introduce a new metric for this problem in particular. This metric , defined before as the "product" actually measures the total execution time or every function. This is justified as a comparison metric because as shown by the above visualizations, the product of the maximum value for the time graph and the maximum value for the counts more or less corresponds to the actual overhead. This figure a plot of the callers, the callees and the time taken for total execution for each of these callee caller pairs on a three dimensional plane. This very clearly points out the fact that the functions involving the linked lists is the culprit. And by mereley clicking on the two odd ones we can get the details of the the caller, the callee and the total time of execution of the function call.

Figure 8. This is a really beautiful visualisation which reiterates the intuition. As shown in the figure(using parallel coords), the thick blue lines correspond to high values of count , but not the highest values of time for each instruction, but ultimately correspond to the highest values for the total execution time. This clearly shows the relationship between the actual and the derived parameters

Figure 9. This figure shows how the "total time of execution " parameter varies with the number of inputs. the various colors correspond to the inputs of sizes betwen 1000 and 7000. As shown by the figure there is an increase in the 'total time taken " overhead for all of the critical functions which have been spotted in the previous steps.

Figure 10. This figure shows how the "total time of execution " parameter varies with the number of inputs. the various colors correspond to the inputs of sizes betwen 1000 and 7000. This is shown beautifully as a trellis graph .

Critique of SpotFire

"An important lesson learnt". This figure shows the trellis which is generated by using the parallel coordinates. The column chosen is the callee of the function and the values for each of the "products" is shown. As this figure shows clearly that as we move along the graph the value of "product" does not show any particular trend(as shown in the ine graphs).What has to be observed here is the fact that we should make all the axes of the same scale and this is not intuitive at all. Also to "un-select" a particular region, one has to click on an empty region!!.

SpotFire allows the user to ability to dynamically filter records using the dynamic query controls. It also allows the user to tightly couple alternate views by marking and selecting a set of records. This hides all other records in all views, allowing the user to easily view the reduced set of records from different perspectives. Unfortunately, there appears to be no way to unhide records without resetting all dynamic query parameters. This can have the undesirable side effect of undoing a zoomed view. Also, there have been quite a few problems with the parallel coordinates visualisation. The most important one of them is the ordering and labelling of the axes.

The window tiling capability would be more useful on larger screens, but on a 14" or 15" screen the tiled windows are too small.

Other Lessons Learned

I learned a few lessons about visualizations during this short project: