- Use the gc_ifc.h stuff to alter how collections take place.  Want to keep
  the heap size smaller, and not allow it to grow.
- Run experiments with varying heap sizes and measure the effect on memory
  usage and overall throughput.  For the latter case, we should measure the
  total time it takes to push through N packets.  Might want to try writing
  a routine uses GenHandler without using sockets, so that the OS is
  factored out of the equation.

Measurement tools
-----------------
- check out oprofile (reports hardware cache miss counters)
- cachegrind (simulator?)

Experiments:
------------
- need to create a heap-only version of the system.
  - use ifdefs to use heap-allocated streambuffs
  - use ifdefs to more substantially change the streambuff definition
    to avoid the use of extra tags, functions, etc.
- try to do some kind of TTCP measurement to see if we can increase the
  overall throughput by reducing the amount of GC that we perform.  Also
  look at latency here; particularly focusing on the distribution.
- do comparison also with MPEG movie.  Might need to run a couple of movies
  to overload things a bit.

Memory management improvements:
-------------------------------
- MEMORY LEAK: every time we close a pending connection and retry, we do
  connectSend to reconnect the connection to the component.  This is
  important in general, because the reattempt to connect (by calling
  SetupConnection) could return a different connection object (this was a
  serious bug before).  The connectSend will allocate a new inportfn inside
  the component's dynamic region, creating a memory leak.  Possible fixes:

  - allocate the inportfn's outside the dynamic region.  Can we stick em in
    the unique region and then free them?
  - do not close the connect on a failure (i.e. don't call
    DoneWithConnection), but rather reuse the existing object, and just
    physically attempt to reconnect.  This requires breaking some
    functionality out of doConnect, and specializing the PendingConnect case
    in GenHandler (amongst other changes potentially).  On a retry, then, we
    would not redo connectSend, but would rather reuse the existing
    inportfn.

- change currentConfig and currentTestConfig in configure.cyc to be unique
- reduce uses of aprintf
- When a connection is closed, we unregister it with the underlying link,
  resulting in a Set::delete, and then register the new connection using
  Set::insert.  The two solutions would be to (1) use some kind of
  allocation-free set, or (2) try to reuse the connection object so as not
  to register and unregister it.  We propose to do the latter above to fix
  a related problem.

Testing:
  - monitoring stuff
  - ewouldblock/partial writes
  - reconfigurations

======================================================================

- Must be a bug in the b/w calc in tracereceiver; some abrupt jumps that
  make no sense

- Make sure I'm dealing with reset connections properly when reconfiguring.
  Not sure this is the case now.

- To allow fair queuing, think about processing the fd's in the fd_set in some
kind of random order.  This way, when two connections are sharing a
congested link and are draining their queues, one connection doesn't always
get preference.

Allow new configuration to run concurrently with the old one.  To do:

DONE:
1) So that the new packets will influence the processing of the old ones, we
   need to "share" the queue for a given link.  In particular, each
   connection has its own queue, but the length and drop policy of that
   queue is determined by the associated link:

   a) each sendinfo will have a reference to a link structure, and the link
      will point to all of the sendinfos' queues.  The link can implement a
      queuelen() function that combines the lengths of all of the other
      queues.

   b) when I go to queue a packet, I check the link's queue length.  If it's
      full then I drop a packet as follows: starting at the top priority,
      iterate through all of the queues associated with the link, starting
      with the queue associated with the current sendinfo (which attempted
      to do the send).  Keep going to lower and lower priority until a
      packet can be dropped.

   -) when I implement this, I should just ditch the other dropping code
      that's there now (i.e. the non-priority based stuff).

TODO:
2) Just as before, the existing connection will get flushed, using flush
   packets.  But at the same time, the new configuration will start to send
   packets through the network.

   a) we know the old configuration has completed when all of its send
      connections have been closed.  We need to clear all of the connections
      and free the configuration at this point.  This implies that we are
      able to keep track of which connections belong to the old
      configuration, and which components belong to it as well.  Probably
      need a separate global for the old components and connections.

      Could have it so that each time a send conn is closed, we update this
      global; then we don't need the haveSendConns() function.

   b) Because we are running things concurrently, the new configuration will
      attempt to receive from/connect to the user data source/sink while the
      old config is already connected to them.  This is desirable: the new
      configuration will queue up packets while waiting for the connection
      to take place.  For this to work, the user send comps for the old and
      new configurations need to be associated with different underlying
      links, so that their queues are not related.

   c) when a send comp gets a flush and a recv comp receives one, the
      connections should be permanently closed, and not restarted.  This
      way, the new configuration's recv's and send's will connect to the
      user apps.  We should get *much* more parallelism this way, since the
      mpgsender should almost immediately reconnect to the network and start
      sending packets.

   d) need to think about what should happen if another reconfig comes in;
      it seems like we don't want to allow three configurations at once!
      Possibilities: 

      a) reject it.  Seems simplest.
      b) delay it, following the same scheme that we do currently.  This
         could be the right answer long-term, but might be hard to do now.
         Also could create stability problems?

----------------------------------------------------------------------
- run diamond experiments again; see if outcome is similar to bowtie.

- what is the effect of having a larger buffer window on the tracereceiver?
  It would be better to have an offline calculation here; I think there is
  enough information otherwise in the file (i.e. frame timestamp and size)
  to do this if I wanted to write a script.

  *** 2 second window seems to smooth things out.

- lots of problems on reconfigurations:
  - It's time consuming, at least sometimes.  Moreover, there is a large
    delay on the receiver side until good performance is seen.  This is
    probably just due to the time it takes to move the packets.

    - Do a ping---what is the latency through M/N?
    - think about the reconfig proto at a high level; what are the factors
      driving reoconfiguration time?

  - It's unstable.
    - I have packets all over the board on various reconfigs
    - Sometimes there are big gaps.
      - is the b/w thing working right?  Seems like the gap would
        imply the b/w dropping.

    TODO:
    - look at LS output files and compare the delays seen here with those in
      the output files.  Look also at delays in sender files.
    - this could just be a function of needing buffering at the receiver.
      But before I go about changing things, I should make sure.

  - talk about/explore fundamental tradeoff:
    - in flushing the queues following a reconfig, there will be fewer
      dropped packets, but it will take longer to observe the new schedule
      quality.

      Note that in general we can't just reuse the connection/share the
      queue because the types of data might change.  This would be future
      work.

- adjust tracereceiver to do buffering, so that output is not so sporadic.
  - need things to be timing-based, so the buffer is drained.
  - configure the size of the buffer (perhaps in terms of time to wait)

- make global scheduler more stable, so there are fewer reconfigs (make it
  more resilient to changes)
  x allow reconfigs only every X seconds?
  x how to adjust the bandwidth creeping stuff so that the various
    edges are more synchronized (and we avoid a cascade of updates)?
  
