Network Traffic Tracing at SIGCOMM 2008

Update: The traces are now available.

This page is a modified version of the OSDI 2006 Tracing Handout

We are a team of researchers who are tracing network activity at SIGCOMM 2008 with a view to making the data available to the research community. We are only recording information that is pertinent to networking research, in a suitably anonymized form. We are not recording sensitive information such as the user or client identities or the content of user communication. This page details what we are tracing and why, how the traces are being processed to protect sensitive information, and whom to contact if you have further questions. Thank you.

Contents:         Tracing Overview         Frequently Asked Questions         Technical Details

Tracing Overview
Joining the trace

Tracing is optional. If you would like to participate in the trace, associate with the SIGCOMM-ONLY-Traced BSSID. Otherwise, join SIGCOMM-ONLY-Untraced. We will not be tracing any data on SIGCOMM-ONLY-Untraced.

During the trace

We are not recording payloads (packet bodies) except for DHCP and DNS payloads, and will collect various headers (802.11, IP, TCP, UDP, and ICMP) and physical layer information of all packets on the Traced network.

After the trace

The trace will be anonymized after collection and the non-anonymized data will be encrypted; only the researchers listed in Q2 below will have access to it. After anonymizing the traces, the non-anonymized trace will be destroyed.

Getting the trace data

The anonymized trace data will be available within six months after the conference. Check back at the UMD Wifidelity project site.

Frequently Asked Questions

Q1. What are the goals of this tracing project?

Our goal is to gather a detailed trace of network activity at SIGCOMM 2008 to improve 802.11 tracing techniques as part of the Wifidelity project and enable analysis of the behavior of a wireless LAN that is (presumably) heavily used. Besides using this data for our research, we also plan to make the traces available to the research community.

Q2. Who is gathering the traces?

The traces are being gathered by a team of researchers from University of Maryland, College Park: Aaron Schulman, Dave Levin, and Neil Spring; in coordination with the local arrangements chair, Ratul Mahajan from Microsoft.

Q3. Who has approved this tracing project?

The tracing plan has been approved by the SIGCOMM 2008 Executive Committee.

Q4. What is being traced?

We are recording network protocol information from all wired and wireless packets sent on the SIGCOMM-ONLY-Traced wireless network. The information being recorded for each packet includes physical layer information such as the wireless signal strength as well as the 802.11, IP, TCP, UDP, and ICMP headers, depending on the packet type. We are not recording packet payloads above the transport layer except for DHCP and DNS payloads. However, we are anonymizing or deleting potentially sensitive information such as MAC and IP addresses, and DNS names.

Q5. How is the trace being anonymized?

MAC addresses and IP addresses will be anonymized using AnonTool.

Q6. Will the packet payload be captured or stored?

Packet payload will be recorded for DHCP and DNS requests and responses. However, information such as DNS names and IP addresses contained in the payload will be anonymized after being stored.

Q7. Will my activities be identifiable?

Given that the traces are being anonymized after collection and the non-anonymized traces will be encrypted during transport and destroyed post anonymization, we believe that it would be difficult for anyone to identify users or learn which Internet services or hosts they have communicated with. That said, we are not in a position to prove that no such information can be gleaned from the anonymized traces.

Q8. What will be done with the anonymized data? Who will have access?

The anonymized traces will be made available to the research community, for example, through a repository such as CRAWDAD We plan to make the data available within 6 months after SIGCOMM 2008.

Q9. Will any non-anonymized data be stored?

Yes, we will be anonymizing the trace offline after collection. However after the traces have been anonymized, the non-anonmyized data will be destroyed.

Q10. Who will have access to the non-anonymized data, and for how long?

As noted in Q9, the anonymization will be done offline, so the University of Maryland researchers listed in Q2 will have access to the non-anonymized data during the time it takes to perform the offline anonymization (no more than a few days after the trace collection is concluded). In the mean time the trace data will be stored in an encrypted form. After the trace is anonymized, the non-anonymized data will be destroyed.

Q11. What identifiable information could still be extracted from the final anonymized trace?

It may be possible to identify users using a side-channel attack, for instance, by exploiting information such packet sizes and packet timing; we do not plan to protect the data against such attacks. Also, we would like to permit the identification of the manufacturer of a wireless NIC (which could be useful when analyzing the traces), so the first 3 bytes of the MAC address will be left non-anonymized. However, this could violate the principle of k-anonymity, i.e., that it should not be possible to identify any user as being a member of a group with fewer than k members. If a group size is smaller than 10, our offline anonymization will replace this MAC-address prefix with another value so as to create a group of at least 10 nodes (i.e., we set k to 10). So it would be possible to identify the 3-byte prefix of a node's MAC address provided that there are at least 10 nodes that share the same prefix.

Q12. How should I protect my data and identifiable activities if I use the wireless network?

As noted above, we are taking every care to obscure sensitive information while still leaving the traces useful for research. However, we have no control over who else might be sniffing on the network traffic, even though such sniffing is against the terms of use for the SIGCOMM wireless network. Since this is an ever-present danger, especially in wireless networks, we strongly recommend that you use secure protocols and procedures for communication (e.g., SSL, SSH, VPN). That said, we are not in a position to provide definitive advice on how best to protect yourself when using a wireless network. You would have to consult your IT staff regarding this.

Q13. Whom should I contact if I have further questions about this tracing project?

Please contact Aaron Schulman (schulman@cs.umd.edu) or Dave Levin (dml@cs.umd.edu).

Technical Details
We are gathering traces of wireless traffic belonging to the traced network at several monitoring nodes distributed across the conference floor. In addition, we are gathering traces on the wired switch to which the wireless access points connect.

Here is a description of the traces we are gathering and the anonymization that is being performed. Our description here focuses on tracing on the wireless LAN. A subset of this (viz., everything above the PHY layer) also applies to the tracing on the wired LAN. What traffic is being monitored? Each monitor will capture all of the 802.11 frames it sees, including:

  1. Data frames
  2. Management frames (e.g., association, authentication)
  3. Control frames (e.g., RTS, CTS, ACK)
What information is being logged? For each wireless frame captured at a monitor, we record up to 250 bytes of the following information:
  1. Per-frame PHY information, including:
    1. Channel frequency
    2. RSSI
    3. Modulation rate
  2. Entire MAC header, with only the source and destination MAC addresses being anonymized as follows:
    1. Online we will be storing all MAC addresses
    2. Offline, we anonymize the MACs all the 3-byte MAC prefixes that occur fewer than 10 times with a common prefix. This ensures k-anonymity, for k=10.
  3. The entire IPv4 and TCP/UDP header, with the source and destination IPv4 addresses anonymized as follows:
    1. The IP address is replaced with a one-way hash.
    2. In addition, we record whether the IP address belongs to the following categories:
      1. Auto conf (169.254/16).
      2. Private address space (10/8, 172.16/12,192.168/16).
  4. The entire DHCP payload, with the following anonymization:
    1. Client IP address (ciaddr) is anonymized as in 3.a.
    2. Client hardware address (chaddr) is anonymized as in 2.
    3. Your IP address (yiaddr) is anonymized as in 3.a.
    4. The "client identifier" option, if present, is replaced with a one-way hash.
  5. The DNS request/response payload, with the following anonymization/deletion:
    1. The domain name in each RR is replaced with a one-way hash.
    2. The resource data contained in each RR is deleted.

Security and privacy issues:

  1. We have taken reasonable measures to secure the machines used for tracing: kept them up-to-date on patches, turned off unnecessary services, protected access with a strong password, etc.
  2. We will throw away the secret key used for the keyed one-way hash once the trace anonymization is concluded to make it difficult to perform a dictionary attack on the one- way hash.
  3. Despite the anonymization, it may be possible for some information to leak. For example, it may be possible to infer which website was visited based on the size of the response received. We are unable to obfuscate such information without damaging the data significantly.