CMSC 838B Information Visualization

Visualizing Mailbox

Yoo Ah Kim       Min-ho Shin

ykim@cs.umd.edu      mhshin@cs.umd.edu

Department of Computer Science

University of Maryland

 

 


Abstract

Electronic mail is one of the most popular computer applications. As the number of emails we exchange increases, it becomes more and more important how to manage huge volume of electronic messages. In addition, email data patterns may give us useful information including the personal history. We propose two visualizations of email dataset: time-based view and thread-based view. Time-based view displays messages in a two-dimensional table of which the rows are people and the columns are received/sent time. To scale large volume of data, we use dynamic query and zooming method. Thread-based view shows emails that belong to the same ¡°conversational¡± thread. It shows all senders who participated in a thread and messages in the order of time with relations among those messages.

 

Keywords: Electronic mail, time-based view, thread-based view, scalability

 

Introduction

Nowadays electronic mail is one of the most popular computer applications. As the number of emails we exchange increases, it becomes more and more important how to manage huge volume of electronic messages. Although emails were invented for asynchronous communication, they are used for other purposes such as task management, personal archives. In addition, email data patterns may give us useful information including the personal history. However, there are no proper visualization techniques, which can meet these purposes.

In this paper, we propose two visualizations of email dataset to help users perform these tasks: time-based view and thread-based view. Time-based view displays messages in a two-dimensional table, of which the rows are people and the columns are received/sent time. To scale large volume of data, we use dynamic query and zooming method. It also has sort, filter, aggregate functions to help users find information they need. Thread-based view shows emails that belong to the same ¡°conversational¡± thread. Threads are created using ¡°reply¡± menu when users send mails. Thread-based view shows all senders who participated in a thread and messages in the order of time with relationship of those messages.

Design Goals

¡×       View sent/received email patterns

With email dataset, users may want to see mail patterns according to time. Interesting questions are who sent the most emails in a certain period of time or when a person sent emails most frequently. To see patterns with large volume of messages, the scalability problem should be solved. We used dynamic query, zooming, aggregation, filtering, and gradation to cope with this problem.

¡×       Find people and emails related to each other

Emails can be threaded using "reply" and several users participate in a mail thread. It would be useful if we can see all participating users and who sent or received emails in the thread with relations among them.

¡×       Search information in the mailbox

Emails are used as personal archives to find information in the future. Several studies [8] showed that semantic hierarchies using folders, the most predominant scheme currently, is not suitable for this task because it is difficult for users to organize mail folders properly and  figure out which mail folder has the mail they need. Because people can easily figure out senders and approximate sent/received time of the message, time-based view can help users find a mail they need. Thread-based view also makes it easy to extract related information by providing a view for all messages in the same thread.

Related Work

Timestore [1] [9] organizes messages by time and sender in a two-dimensional grid as shown in Figure 1. Messages are displayed as dots encoding the number of messages as size. It allows narrowing of the search space using full-text searching. They also merged it with task and calendar management system. Timestore focused on time-based archiving and retrieving emails.


Figure 1. Timestore

Outlook 2000 also has time-based view (Figure 2). They display all messages with subject at received time without aggregating by date or

considering senders. Because they used the fixed width for a day and show all messages with subject, the view might be messy and hard to understand if there are too many messages.  In the case that many emails arrives for a short time period, they expand y-axis to list them.

Threading is necessary to help manage conversation history and track the status of conversation in emails [8]. Many systems are developed to visualize conversations in chat programs and instant messaging services [2][3][4][5][7].  Netscan thread trees display conversation thread for newsgroups. But visualizing email thread is more difficult because both senders and receivers are important and there are two kinds of messages - incoming and outgoing - unlike newsgroup.

Figure 2. Outlook 2000

 


 


Figure 3. Netscan Thread Tree

 

Time-based Visualization

¡×       Features

In this view, we display messages in a two dimensional grid, of which the row is email address of a person and the column is date as shown in Figure 4.  Each grid has the messages that the corresponding person sent/received at the given time. We encoded the number of messages as height in bar chart (with fewer messages) or gradation in spot (with more messages).

The first section shows email addresses of people who sent or received mails. The second section shows the number of mails the person sent/received in total, using bar chart. Users can choose the option of incoming mails or outgoing mails or both.

Users can choose date level as date, month and year, so that the messages can be aggregated by the level. When it is aggregated by date, there appear vertical lines per week to help users see weekly patterns.

Sort can be done in the order of email addresses, domain names, and message counts. It has functions to filter people whose email address has a certain sub-string. For example filtering by domain name is an interesting query. It is also possible to search messages by email addresses or subject.

Figure 4. Time-based visualization. The darkness of each cell represents the number of messages in it

¡×       Scalability

- Bar chart vs. Gradation

To see the number of messages in each gird more accurately and compare with others, bar chart might be more helpful. But if we have many people in a screen or a range of period is very long, it is difficult to show the patterns using bar chart. For the case that we have many people and long-term period, we have another view using gradation. Each cell has spots and the gradation of the spots represents the number of messages. This view will give a good overview of messages in terms of people and date. While incoming and outgoing messages can be shown simultaneously in bar chart with different colors, spots will only show the total number of messages as chosen. Figure 5 shows the views using bar chart.

Figure 5. Bar chart vs. gradation. Bar chart is better to see and compare patterns in detail for both sent/received mails. Gradation is better to see large volume of data

- Dynamic Query

To manage large dataset, we also use dynamic query method on people and date. This will dynamically filter and zoom the range of data so that users can easily find the data they want to see (Figure 6).  If users change a range, then data in the range will fit into the screen and data out of the range gets hidden. By moving slider bar, we can see the hidden data, too. The labels such as addresses or date fit dynamically to the chosen range, which displays more detailed information as zoomed in.

 

Figure 6. Dynamic Query. Users can zoom by narrowing the range of the slider bar

¡×       Message Selection

As users put a mouse over the cell, the information of the cell- person and date - can be displayed. Users can see the detailed information by clicking the right mouse button on the cell. Then a pop-up window will show up with a list of the messages in the cell. Each message contains the subject and the number of messages in the thread which it belongs to. To see the thread view related to a message, users choose an individual message in the list. Figure 7 shows the pop-up window for message selection.

Figure 7. Message selection


Thread-based Visualization

Thread view shows the relations among messages as shown Figure 8. For a chosen message, we find all messages that are related to it and display them with all the people who participated in the thread. The rows are people and the columns are time. Messages are listed in the order of received/sent time. Note that unlike newsgroup data, showing both senders and receivers is very important.

We represent senders as big red rectangles and receivers as small blue circles. There appear arrows between senders and receivers of the same mails. If a mail is the reply mail to another mail, then a link with different color connects two mails, which is a red thick line in Figure 8. We divide time axis by date to help identify easily time information of messages.

Figure 8. Thread-based visualization. Red rectangles are senders and blue circle are receivers. Reply information is represented as red links. An arrow shows that a mail is sent from the sender to the receiver.

 Problems in Visualization

For outgoing mails, displaying receivers is important because senders are always the writers of the messages. Receivers may not be one, so the same messages may appear several times in time-based view. This may display more messages on screen than really exist. In some sense, we can think that several messages that have the same contents are sent to receivers. But it is also very hard to know that all those messages are actually the same message.

Our thread view can be displayed only if users write messages using "reply", which usually add reply information in email headers. But sometimes users may send emails without using ¡°reply¡± although they are in fact replies to other mails. In this case, we should consider subjects, contents and receiver/senders group but it is much more difficult to get the correct information. "Forward" information also can be useful for constructing thread, but it is not available in our implementation because this is not a part of the standard email header.

In case that the same person use several email addresses, we cannot detect that they came from the same person. Especially, if users are in a mailing list, we cannot figure it out only with mailboxes. In this case, it should be possible that users can specify the list of addresses used by each person and merge the email received/sent from the same person.

Future Work

In our visualization, users can see data in many ways using filter, sort, search, etc. But they may want to edit or annotate at messages for future use. This function can be useful, especially in email dataset. For example, users may want to mark messages as it needs to be replied or as a reminder for future tasks.

Search functions can be done only for subject, and sender/receivers. But it will be useful to search among the contents. Specifically we might want to find a message that has URL, Email-address, or attached files.

In time-based visualization, we can aggregate or filter people based on domain name of their email addresses. But other aggregation/filtering can be done for example, if we define groups for people in various ways. We can make a group based on thread or based on some real relations such as family, friends, colleagues, etc. More generally, it would be good if we can connect this visualization with databases that have information about people, filtering/aggregating people based on the database.

We can think of another useful view of emails: group-based visualization. Email exchange pattern will give useful information about how frequently people communicate to each other. We may group people based on how frequently they were in the same thread and visualize those groups as graphs. 

Conclusion

We proposed two visualizations of email dataset: time-based view and thread-based view. Time-based view displays messages in a two-dimensional table of which the rows are people and the columns are received/sent time and each cell has a list of messages for the person (row) and the time (column). To manage large volume of data, we used dynamic query, zooming and gradation in this view. This view will give users temporal email exchange patterns of correspondents. Thread-based view shows the emails exchanged using "reply". It displays all senders and messages in each thread in order of time, representing what kinds of relations among the messages.

Acknowledgements

We would like to thank Jihwang Yeo and Hyunmo Kang for their valuable comments.

Reference

[1] Baecker, R., Booth K., Jovicic, S., McGrenere, J., Moore, G. "Reducing the Gap Between What Users Know and What They Need to Know"

[2] Donath, J., K. Karahalios, and F. Viegas, "Visualizing conversations", In Proceedings of HICSS 32, January 5-8, 1999

[3] Rodenstein, Roy and Judith S. Donath. (2000) "Talking in Circles: Designing A Spatially-Grounded AudioConferencing Environment", In Proceedings of CHI '2000, pp. 81-88

[4] Smith, Marc A., Cadiz, JJ and Burkhalter, B., "Conversation Trees and Threaded Chats", the Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work

[5] Smith, Marc A. and Fiore, Andrew. "Visualization Components for Persistent Conversations", ACM SIG CHI 2001

[6] Shneiderman, B., "Dynamic Queries for Visual Information Seeking", IEEE Software, 11(6),  70-77

[7] Viegas, F. B. and Donath., J. S.  "Chat Circles", Proc. of CHI'99. 1999

[8] Whittaker, S. and Sidner, C. "Email overload: exploring personal information management of email", In Proceedings of Conference on Human Factors in Computing System `96

[9] Yiu, K., Baecker, R.M., Silver, N., and Long, B., "A Time-based Interface for Electronic Mail and Task Management," In Design of Computing Systems: Proceedings of HCI International '97, Volume 2, Elsevier, 1997, 19-22.

¡¡