Visualizing Chevron Oil Data with Spotfire and the Table Lens


Beth Weinstein
CMSC 838B: Information Visualization Application Project

February 28, 2001

demo



Introduction

The oil data examined for this analysis is production and financial information showing how specific Chevron oil fields performed during the year 2000.  There are a total of 81 oil fields.  The individual oil fields are grouped by assets (teams), and then assets are grouped by profit centers.  This grouping creates a hierarchical structure.  There are 37 variables for each oil field including production levels, production costs, revenues, and reserves.  It is important to analyze financial data of this sort in order to asses how oil fields, or even broader, how profit centers are performing in order to make informed decisions about the future of the field and the company.



Visualization Tool Description

Two visualization tools were used to analyze the oil field data: the Table Lens and Spotfire.  The Table Lens, developed by Xerox PARC, allows the user to visualize tabular and hierarchical data using animated "focus + context" or "fisheye" techniques.  The user views the data in a "cases by variable" table. Spotfire Pro 4 for Windows, which had its beginnings as Christopher Ahlberg's graduate research project, is a general visualization tool that can be used to display a variety of data including multidimensional data using dynamic queries.  Today it is used commercially by pharmaceutical, biotechnology, and manufacturing industries.



Interesting Data Features

Relationships were found amongst the variables over the range of oil fields.  While some trends appeared in both the Table Lens and Spotfire, some correlations were observed better with the Table Lens and others more clearly with Spotfire.

Before the visualization tools were used, the linear correlation between income taxes, A/T earnings (reported and operational), and B/T earnings (reported and operational) could be predicted.  This relationship was seen clearly with both the Table Lens and Spotfire.
 

The Table Lens:
 

Figure 1: The image below shows the entire data set over three variables: (1) number of barrels of crude oil produced over the year 2000, (2) number of barrels of crude oil produced per day, and (3) crude oil revenue for the year, respectively.  There is the obvious, known trend between the number of barrels of oil produced for the year 2000 and the number of barrels produced per day.  However, there is also a linear correlation between the number of barrels produced per year/day and the crude oil revenue for the year.  The more crude oil the oil field produced, the more money they made for the year from crude oil.  This last statement seems to be true for the data, except for two distinct outliers.
Crude Oil (Bbls) vs. Crude Oil Revenue
 
 
 

Figure 2: The image below shows the entire data set over two variables: (1) total production expense and (2) average net capital employed.  There seems to be a linear relationship between the variables.  This implies that the more it costs to produce each oil field, the more company money is used to acquire physical project needs.
Production Expense vs. Net Capital Employed
 
 

Figure 3: The image below shows the entire data set over three variables: (1) number of barrels of crude oil produced over the year 2000, (2) number of barrels of crude oil produced per day, and (3) end of year P1 reserves in barrels of oil equivalent gas.  The image shows a somewhat linear correlation in the variables, so that the oil fields that produced the most oil over the year/day had the most hypothetical amount of oil left in the field at the end of the year. The oil fields that produce more crude oil could have more crude oil still left in the field because they were larger oil fields from the start.
Crude Oil (Bbls) vs. End of Year Reserves
 
 

Spotfire:
 

Figure 4: This image shows the same relationship as in Figure 3 but in Spotfire.  The two very long columns seen in Crude Oil (Bbls) in Figure 3 correspond to the two outliers: the uppermost yellow and red squares in Figure 4.
The image below shows the entire data set over two variables: (1) number of barrels of crude oil produced over the year 2000 and (2) end of year P1 reserves in barrels of oil equivalent gas.  The image shows a linear correlation in the variables, implying that the oil fields that produced the most oil over the year had the most hypothetical amount of oil left in the field at the end of the year.
Crude Oil (Bbls) vs. End of Year Reserves
 
 
 

Figure 5: The image below shows the entire data set over two variables: (1) crude oil revenue for the year 2000 and (2) total revenue for the same year.  The image shows a linear relationship between the two variables, implying that crude oil revenue makes up a large portion of the total revenue.  This relationship may be surprising, since each oil field does not produce only crude oil, but produces natural gas as well.
Crude Oil Revenue vs. Total Revenue
 
 
 

Figure 6: The image below shows the entire data set over two variables: (1) deprecation, depletion, & amortization and (2) total production expense.  The linear relationship of the variables implies that as the value of assets reduces over time, a larger total production amount has to be paid.
DD&A & Abdn. vs. Production Expense
 
 

Figure 7: The image below shows the entire data set over two variables: (1) actual abandonment in dollars and (2) profit center.  Using the hierarchy in the data, the profit center HAT&T, shown in blue, seems to have more oil fields with a high actual abandonment than either of the other two profit centers. The HAT&T Profit Center should be examined since, as seen in Figure 5, HAT&T does not have higher total revenue values than the other two profit centers, so it should not pay more than the other profit centers.
Actual Abandonment vs Profit Center
 
 



Visualization Tool Evaluation

The Table Lens makes a very good first impression on the user and possesses many strong features.  Any action is easily reversible.  The Table Lens allows for the last 10 actions to be undone and includes unfocus and unspan buttons.  It has great functionality such as its ability to filter, spotlight, sort and move columns, and span (similar to Spotfire and Fisheye).  In addition, the color of the data for a column can be redefined to reflect the actual information it represents.  Overall, clear trends and relationships are easy to observe.

However, the Table Lens does have its limitations.  While tasks are reversible, part of an action cannot be undone.  For example, if several span objects exist, the user cannot take away just one of them.  The user must delete them all and replace the span objects still wanted.  The same is true for items in focus.  Also, column name labels are not clear at first glance.  The user has to focus in order to read the column name or use the tooltips.  Finally, filtering the data is not a quick task.

Spotfire is an extremely robust tool.  Firstly, it is easy to observe outliers.  A line of best fit can be applied to the data to see to what extent a relationship is linear.  Also, Spotfire is better at viewing ambiguous correlations.  Finally, the dynamic query is simpler and faster than the filtering techniques of the Table Lens.

Nonetheless, Spotfire has limitations as well.  When many data points have similar (x,y) positions, even the great stretching feature of Spotfire still cannot help in distinguishing among the number of points involved.  Therefore, it is hard to select the correct data point.  Also, the user cannot reveal an entire column name label on the right side of the screen without increasing the width of the rightmost frame since the column name label is on the same line as the slider range label.

Both visualization programs used in this analysis are very effective tools.  They are user oriented because they accept a wide range of data formats, and for the most part handle the data in the way the user expects and wants.  However, both tools have trouble handling missing data.  Specifically with respect to my data, the Table Lens was better for viewing close trends, whereas Spotfire was superior for detecting ambiguous correlations.  This concept is displayed in Figure 3 and 4.  Spotfire shows in Figure 3 the linear relationship with the best of fit line better than the Table Lens shows the trend in Figure 4.



References

Rao, R., and Card, S.K.  "The Table Lens: Merging Graphical and Symbolic Representations in an interactive Focus + Context Visualization for Tabular Information."  Proceedings of CHI '94, ACM Conference on Human Factors in Computing Systems, New York, 1994: 318 - 322 and 481 - 482.



 bweinste@cs.umd.edu