Malik, S., Du, F., Monroe, M., Onukwugha, E., Plaisant, C., Shneiderman, B. (May 2014)
A common type of data analysis is finding the differences and similarities between two datasets. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two groups (or cohorts) of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. In this paper, we describe a taxonomy of metrics for comparing cohorts of temporal event sequences, including sequence, time, and attribute metrics. We also present a visual analytics tool, CoCo (for "Cohort Comparison"), which balances automated statistics with user-driven analysis to guide users to significant, distinguishing features between the cohorts. Lastly, we demonstrate the utility and impact of the visual analytics tool with a user study.