At present, there are three groups of genes: those of which both the sequences and functions are recognized; those of which the sequences but not the functions are recognized; and those neither the sequences nor the functions are recognized. The first group genes have specific descriptive titles. Second group genes are temporarily titled as Expressed Sequence Tags (ESTs).
The Spotfire scatter plot shown in Fig1 gives a visualization that about 1/4 of the 3615 genes are currently still ESTs.
Fig 2 is using Table Lens' Spotlight feature to visualize the same data, and it gives identical conclusion. Fig 3 was intended to visualize how the ESTs get distributed among the known genes. Surprisingly, Table Lens gives an wrong illusion that majority of the genes are still ESTs ( which, I think, is one of the limitations of Table Lens).
The parallel coordinates in Spotfire 5.0 is visualizing a gene expression pattern as in Fig 4. Each experiment condition is considered as a coordinate. The visualized pattern is a solid line connecting all coordinates.
TableLens visualizes the expression patterns as a sequence of bars of different lengths as shown by Fig 5.
TableLens is powerful visualizing the pattern of columns too. From Fig 6 it is very easy to see that samples uacc93-047 and uacc930 have very even and small ratios for most of genes, while the other columns on the right have obviously larger and uneven ratios.
After talking to one of the researchers at NHGRI, I learnt that genes of the same functionality tend to have similar expression patterns. This implies that if we could find a gene in group 1 and it has a matching expression pattern with the EST in question, it might have a good chance that the EST may have the same functionality as the former. This provides a way to help researchers to speed up fully identifying ESTs. So, the big question becomes: given a expression pattern of an EST, how do we find a gene in group 1, which matches the EST pattern reasonably well?
With Spotfire, I tried both scatter plot and parallel coordinates techniques. In theory, on the scatter plot display, if I first pick up one EST, and then adjust the scroll bars to narrow down each attribute to be around the value of the EST, the remaining genes shall be the ones matching the EST. In practice, however, scatter plots could not manage problems of this complexity. Every time I tried, it ended up that, long before I adjusted half of the attributes, the remaining genes have dropped to zero.
The problem is, first, our data set has as many as 38 attributes. Second, I don't have any clue how much tolerance I shall allow for the attributes I begin with. Third, whatever order I adjust the attributes is actually setting up an implicit priority to be used finding the matching pattern.
I then decided to switch to parallel coordinates, which is available in Spotfir 5.0. Parallel coordinates allows displaying all 38 attributes on one page. Searching for matching gene patterns is converted to finding matching line patterns. Fig 7 gives an example displaying a group 1 gene is matching reasonably well with the EST pattern, shown in Fig 4 above. Fig 8 is a just zoomed display of Fig 7. Fig 9 is the scatter plot which encloses the same matching genes.
Fig 10 shows an example where TableLens illustrates two genes also have very similar expression patterns, but I found it is very difficult using TableLens to find such matching patterns.
It was unexpected that TableLens took much longer time importing the excel formated melanoma data than Spotfire. To make it worse, TableLens failed to display any loading status, and caused me keeping thinking it had crashed and rebooting the machine several times, until I eventually realized it was just the way TableLens works and the amount of time it needs importing the data.