Wongsuphasawat, K., Shneiderman, B. (April 2009)
An increasing number of temporal categorical databases are being collected by various institutions: Electronic Health Records with millions of records of patient histories in healthcare organizations, tremendous traffic incident logs in transportation systems, or massive student records in academic institutes. Finding similar records within these large-scale databases is a challenging problem. A major challenge is how to define a similarity measure that captures the searchers intent. Many methods for computing a similarity measure between time series have been proposed, but temporal categorical record is different and requires fresh thinking. We then propose a temporal categorical similarity measure, called the M&M measure, which is based on the concept of aligning records by sentinel events, then matching events between two records. The M&M measure is calculated as a combination of the time differences between pairs of events and number of mismatches. To accommodate customization of parameters in the M&M measure and results interpretation, we implement Similan, an interactive search and visualization tool for temporal categorical records. A usability study with 8 participants demonstrates that Similan was easy to learn, but users had more difficulty understanding the M&M measure. Users had strong opinions that Similan could help them find similar records in temporal categorical databases. In response to feedback from the study, we also develop a new prototype. A pilot study suggests that while binned timeline in original interface is simpler and more readable, the continuous timeline in the new interface is better for showing fine-grain information.