UDP/IP Stack Processing Overheads
The processing time needed for the TCP or UDP / IP stack would vary from architecture to architecture. Some of the techniques that we are currently studying, to estimate the bottleneck bandwidth between endpoints, need these estimates for accuracy.

For this reason, we instrumented the kernel to get timing information on UDP/IP stack processing. We use the NetDyn tool to send UDP packets between the source/sink host and the echo host. We instrument the kernel at the echo site. For any incoming frame, we check if it contains packets of from the NetDyn experiment. In that case, we modify the data in the UDP packet, to include timing information on when the Ethernet frame was resulted in a function call to the Ethernet driver. The NetDyn echo process, running as normal on the echo site, puts its own timestamp, once the UDP packet reaches the user level. The packets are then sent back to the sink host, where then are logged.

To ensure that the modified UDP packets are not discarded by the UDP driver in the NetBSD kernel, we disable the checksum computation in the UDP driver. This leads to scope of some deviation from a normal kernel where the UDP checksum computation would add some overhead.

Below we present some of the results of our experiementation --
CPU
UDP Packet Size
Processing time (in microsecs)
Intel 80486 32 bytes 158
Intel Pentium II 266 32 bytes 34-37
Intel Pentium II 266 64 bytes 38-43
Intel Pentium II 266 96 bytes 39-45

It is important to observe that a normal 32 byte UDP packet would fit into a single mbuf structure of the kernel, while 64 and 96 byte UDP packets would need 2 mbufs. Hence, 64 and 96 byte packets seem to need similar processing time.

An interesting observation was made when we sent packets in trains of 2 or 3. These trains were all sent out from the source host, in immediate succession. It was observed that for all cases, the packet processing time overhead for the second and the third packets are lower than that of the first packet by about 5-7 microsecs. A probable reason for this could be that, at the time the first packet of a train arrives, the user process is in the asleep queue. On receiving a packet for this process, the kernel scheduler places the process in the runnable queue, and only then the process gets a chance to be activated. However, in case of the second and third packets, the process is already running (doing processing of the first packet). So the overhead of the user process wakeup, is lacking in these cases, and hence the processing time is lower.

[Top] [Studies in 1997-98] [Back to NetCalliper Project]