Anonymized Packet Traces and AP Syslogsigcomm08-traces.tar.gz (1.6GB)
Anonymization Source Codesigcomm08-tcpmkpub.tar.gz (278KB)
This is a modified version of tcpmkpub. The differences are outlined in Anonymization Scheme.
SummaryWe collected a trace of wireless network activity at SIGCOMM 2008. The subjects of the traced network chose to participate by joining the traced SSID. This is a description of that trace, how it was collected, and how it was anonymized.
- 3 types of traces: 802.11a, Ethernet and Syslog from the Access Point
- 4 BSSIDs on 4 channels, 1 NAT
- 8 802.11a monitors, 2 monitors per channel
- DHCP assigned IPs are in the 18.104.22.168/16 and 22.214.171.124/16 subnets
- Prism headers on 802.11 packets
- Anonymized with modified version of tcpmkpub
- Packet traces include anonymized DHCP and DNS headers.
- T-Fi plots show the completeness of the traces.
Network TopologyA Xirrus Wi-Fi Array provided the traced 802.11a network (SSID:SIGCOMM-ONLY-Traced). The WiFi Array consisted of four BSSIDs that were broadcast on four 802.11a channels.
After anonymization, the DHCP assigned IP addresses for clients are in the following subnets:
Trace Collection Topology
802.11aDuring most of the conference approximately two 802.11a monitors were placed at the four corners of the main conference hall. We did not record the exact location of each monitor. However, we tried to capture each channel with two monitors placed at opposite corners of the room.
EthernetPackets sent from the NAT to the AP and from the AP to the NAT were captured using an Ethernet trace collector attached to the packet dump port on the WiFi Array.
SyslogA tracing box connected to the Array's management port collected syslog traces. Unfortunately, after the conference we noticed that these traces were corrupted. However, we were able to salvage one of the syslog traces because we collected it with the Ethernet tracing box.
Filtering Traced Users
Who are we protecting by anonymizing the data?The identity and activity of users who opted to be traced during SIGCOMM 2008.
802.11aEach packet in the wireless traces meet one or both of the following criteria:
- BSSID address matches the "Traced" BSSID.
- Packet is a probe request and probe is for the "SIGCOMM-ONLY-Traced" SSID.
EthernetThe AP was set up with a monitor VLAN for the "SIGCOMM-ONLY-Traced" network.
SyslogThe syslog trace only contains information about users associated with the "Traced" network. The method to filter out syslog messages about "UnTraced" users is as follows: Include all syslog messages while a client is associated to the "Traced" network. The syslog messages indicate when a client associates to, and disassociates from the "Traced" network.
Anonymization SchemeThe packets are anonymized using a modified version of the tcpmkpub tool. Meta-data about the trace anonymization is provided in the file tcpmkpub.log.export.
[new] indicates new functionality added to tcpmkpub
Syslogmacmkpub, a MAC address anonymizer based on the tcpmkpub anonymization code, anonymized the MAC addresses in the syslog traces.
Checksums (IP/UDP/TCP) The anonymization code recomputes checksums. The anonymization meta-data (tcpmkpub.log.export) holds information about packets in the traces with bad checksums. Bad checksums are indicated in the anonymized traces by a 1 in the checksum field, or 2 if the checksum was 1, A UDP checksum of 0 is not changed.
Ethernet MAC Addresses:
- The 3 high and low-order bytes are hashed separately.
- The high-order 3 bytes are hashed to retain vendor information.
- Addresses containing all 1's or all 0's are not changed.
- The Multicast bit is retained.
VLAN [new]The vlan header did not need to be anonymized.
- MAC addresses are anonymized using the same method as the Ethernet MAC addresses.
- If the packet is fragmented (fragment bit == 1 or fragment # > 0), skip the rest of the packet.
- External addresses hashed using prefix preserving scheme .
- Internal addresses hashed to unused prefix by the external addresses and the subnet and host portions of the address are transformed.
- Multicast addresses are not anonymized.
- The tcpmkpub paper recommends removing packets from network scanners. We did not determine this was a threat to our network as the identity tied to a local address was dynamic.
- If the ARP packet contains a partial IP packet, use the IP anonymization above.
- IP addresses anonymized using the IP anonymization procedure above.
- The TCP timestamp options are transformed into separate monotonically increasing counters with no relationship to time for each IP address in the anonymized trace.
- If timestamp is 0 do not modify it.
- Replace timestamp with a unique number incremented in the order of the trace.
UDP Recompute checksum according to checksum policy above.
- Anonymize DNS labels individually by taking the Keyed-HMAC of the label.
- Keep the low-order 8 bytes of the hash digest as the label.
- Convert the digest to ASCII by converting to hex.
- Store the new length of the DNS packet in the following fields: [IP/UDP/DNS,PCAP Captured, PCAP On Wire].
- Anonymize any type 'A' resource record data using the IP anonymization scheme above.
- Client IP address is anonymized.
- Client hardware address is anonymized.
- Your IP address (yiaddr) is anonymized.
T-Fi Plots - Visualizing Packet Trace Completeness For more information about T-Fi plots, and for source code to generate your own T-Fi plots see Wifidelity.
What is a T-Fi Plot?T-Fi plot visualizations provide a quick understanding of the completeness of a 802.11 packet trace.
A T-Fi plot is a heat map in which:
- The orientation on the y-axis shows completeness; the fraction of transmitted packets caught by the monitor.
- The width of the shaded region on the x-axis shows the range of load.
- The intensity of the shaded region shows the frequency of load.
New T-Fi Plot Feature: The x-axis of the T-Fi plots below are separated into 0-100 in linear scale and 100-1000 in log scale.
Example T-Fi Analysis
The left trace is more completeThe left T-Fi plot above indicates that the trace contains more complete higher load intervals than the right trace. The blue region on the left T-Fi plot shows that the majority of the trace scored between 0.6 and 1 when under a load of 0-40. The right trace scored between 0 and 0.2 in the same load interval. This result is not surprising as the left trace was captured during the SIGCOMM workshops where there were less participants and the monitors were close to the clients. The right trace was captured during the conference sessions where there were more clients and a larger distance between the clients and monitors.
T-Fi Plots for SIGCOMM 2008 Traces** Not included: T-Fi plots for trace files containing only a few packets.
Capture redundant dataIn our experiment, if we had not captured the syslog data on the wired interface, we would have lost all of the syslog data.
AcknowledgmentsThe following people made the trace collection possible:
- Dave Levin - University of Maryland, College Park
- Neil Spring - University of Maryland, College Park
- Ratul Mahajan - Microsoft Research
- Rhett Prichard
- Matt Mark - Xirrus
Questions? Contact:Aaron Schulman, email@example.com - University of Maryland, College Park
- R. Pang, M. Allman, V. Paxson, and J. Lee. The Devil and Packet Trace Anonymization SIGCOMM Computer Communication Review, 2006
- J. Xu, J. Fan, M. H. Ammar, and S. B. Moon Prefix Preserving IP Address Anonymization: Measurement-Based Security Evaluation and a New Cryptography-Based Scheme Network Protocols, 2002
- SIGCOMM 2008 Tracing Handout
- OSDI 2006 Tracing Handout
- A. Schulman, D. Levin, and N. Spring On The Fidelity of 802.11 Packet Traces PAM, 2008