Dynamic Region issues
=====================
Need to free the dynamic region when we close a connection from the old
configuration.  Don't want to do it from the current configuration though!

Porting issues
==============
There's the artificial problem that pattern matching on a unique pointer
fails, since it creates an alias.  We can't get around this for streambuffs
because we use tag_t's.  This therefore inflates our numbers about how
things might be hard to use.  Or, it might present an accurate picture.

  Thought: recode things to not embed the databufs?  Then I should be to
  change things more easily.

  streambuff: 296 LOC, 16 swaps, 8 alias calls (+1 inferred)
  streambuff heap: 240 LOC

    How to characterize the diffs?

There is the valid point that reference-counting code is not interchangeable
with different regions.  That is, generic library functions that use
reference-counted stuff doesn't have to change.  But code that actually
makes aliases, drops them, etc. can't change as easily.

Another note: we make good use of swap in the array-queue stuff.  There is a
matching function that finds the first element of a queue that satisfies a
given predicate, so that when it removes it, it needs to percolate null to
the end.  Therefore, we use swap to swap the elements of the queue to move
them down to do this.

Experiments to perform
======================

TTCP for TCP & UDP, M/N server just forwarding
- Need to make sure that we aren't underutilizing the router (unlikely)

  1) What's the maximum throughput MediaNet can sustain w/ and w/o GC
  2) What's the memory footprint/requirements?

- For the TCP case
  - need to exert backpressure
    - for the Send() computils, need to add "backpressure"
      XML attribute, and update accordingly.

- For the UDP case
  - putting in timers allows all packets to get sent.  However, this
    is not very deterministic, since sometimes a timer will yield
    full bandwidth and sometimes not.  Perhaps we must pick a timer
    that gives full bandwidth 9 times out of 10, or so?

----------------------------------------------------------------------

Timers:

Packet arrival:
- take the current time
- read the packet
  - if it's not complete, reset the timer to NULL
  - cache the first 4 bytes into a global unsigned int
- forward the packet

Packet send:
- assuming CT_Send and queue is not empty, and the retval OK, and first
  four bytes are the same, and timer not null, take the time and subtract.

----------------------------------------------------------------------

Trying to get the right streambuff definition:

1) I'd like to avoid two levels of indirection for the data buffers, so that
   I can use alias.

    |  (1)
    v
   +------------+
   | streambuff |
   |------------+
   |  |  | ...  |
   +--+--+------+
    |  (2)
    v  
   +---------+
   | databuf |
   | buf     |
   +---------+

- Ubufs, Hbufs are in a tagged union, so I have to go through the
  tagged union to get at the bufs part.  If (1) is not unique, I can just
  take the address of this thing.  If (1) is unique, I can alias it, and
  then take the address.

  WAIT---I can't take the address of a tagged union field; this could lead
  to an unsoundness, since I might then change the contents of the union.  I
  can only take the address of the union itself.

- I have to use pattern matching to get at the values, since the
  size is in an existential.  This may consume things---so I either need
  to alias (possible), or figure out how to fix this.

- Assuming that pointer (1) is non-unique, I can't alias through it to
  pointer (2).  I could solve this by 

  a) forcing (1) to be unique, and setting the noconsume attribute.  But
     this will mess up the pattern matching stuff?

  b) using swap on (2) each time I try to grab one.  But this seems
     painfully expensive, once for each buffer.  However, the alternative is
     to allow another level of indirection that I could alias through, so in
     that case I'm trading an extra dereference for reducing the number of
     swaps.  In the case that I have only one buffer, this will be lose.
     Not sure where the break-even point is.

  c) will making it const somehow help?  It would seem not, since I can't
     assume (2) is unique.  It would be interesting to update the flow
     analysis so that if I do a tagcheck, then I can assume the argument is
     unique (and can thus be aliased in a path).

----------------------------------------------------------------------

- Bugs
  - Fix configuration---MPEGtraceopt.xml is not inserting a prio config on
  intermediate nodes when dropping frames for some reason.
  - BWCREEP is happening twice when there are two user configs:

20.079186 RECONFIG 0.97-0.97 pc1.pc2.pc3
20.079186   reason: 
22.220703 BWUPDATE 192.168.2.3 183489.33 183489.33 214054.11
22.220703 BWCREEP 192.168.2.3 188994.01
22.220703 BWCREEP 192.168.2.3 194663.83

  - also, it appears that reports are coming in clumped together, so that
    the GS reconfigs multiple times quickly.  Want to prevent this, and also
    make sure that we're overriding configurations properly during an update
    of the LS.

- Need to think of ways to get the global scheduler to reconfig
  as infrequently as possible:

  - only allow reconfigs every N seconds.

- Oops---one problem is that if I get a reconfig almost immediately
  following a prior one, I could have packets queued on pending
  connections that will get dropped.

----------------------------------------------------------------------
9/30/02

Goals:

1) Strengthen simple configuration measurements.

2) User configuration:

   a) Two video sources (robots?)
   b) Base station/distributor.  Think as a van that hosts the wireless
      network the robots use.
   c) User applications:
      i)   PDA's connected via wireless (3G anyone?)---CPU+B/W limited
      ii)  Workstations @CNN via a satellite link---B/W limited?
      iii) LAN for analysts---resource rich.
      iv)  Viewers in the distributor itself---resource rich.

   Want to run a test that shows we can have multiple users/video sources.
   Illustrate output at one or more of these places under congestion to show
   adaptivity and/or flow re-routing.

   Incorporate DSL/Cable into this picture for more network setup?

   The paper will show a picture, some basic results, and a description of
   what works/didn't work.  Might want to talk about total time to
   reconfigure (here and in microbenchmarks).

----------------------------------------------------------------------

Share TCP connections between nodes to avoid teardown/setup on reconfigs:

1) GS notates its send/recv connections with id's, rather than ports.  One
   question is how these id's are assigned, although the feeling is that
   they are related to the flows between user-specified configurations.

2) Packets have an id, in addition to a length.  Each node will have a
   single recv that receives packets from multiple incoming connections, and
   additionally a send that goes to different nodes.  In addition, they each
   have a demultiplexing and multiplexing table, respectively.  The former
   will map input fd/packet id pairs to output ports (i.e. media-net
   components), and the latter will map input port/packet id pairs to output
   fd's.  In both cases, the output tag could change, so the table will
   indicate that as well.  In fact, it may be enough not to know what the
   input location is (whether port or fd): the id on the packet should be
   enough.

3) Queuing will be shared in this case.  This is problematic.  One idea is
   to not have a fixed-length queue, but a time-based queue.  That is, only
   keep packets that have been in the queue less than some time T.

   Also, need a way to share fairly between different user flows.  Not sure
   if the time-based thing will work this out.

4) Reconfigurations need to update the tables on the send/recv connections
   to deal with the new configuration, in parallel with the old one.  Synch
   issues here maybe?

The result should be that we never have to tear down connections on
reconfigs, which should hopefully help performance.

----------------------------------------------------------------------

- on one run, the mpgsender took a while (2 secs) to connect to pc1.  Not
  sure where the fault was here.  This suggests that closing and
  reconnecting is less preferable than sharing existing connections.  Would
  need to develop a way to do this:

  *) make the names of intermediate connections indicate their semantics.
  That is, we need to keep track of a) who the upstream sender is, and b)
  who the downstream receiver is.  In both of these cases, the path should
  be from/to the user component, including intermediate hops.  This is going
  to be important so that when we change paths, we make sure that we don't
  mistakenly keep bogus connections alive.  Have to think about this more.

  *) have a way to save connections on a reconfig, and then hook them up
  with connections in the new configuration.  Would need to deal with
  closing connections no longer relevant, and migrating connections still in
  use.

  *) in the case we can't keep the connections alive, we could migrate the
  queue.  That is, if the user component downstream is the same, and the
  upstream path is the same, then we will need to reconnect, but we can do
  so by passing over the current queue.  How would this combine with
  flushing semantics?

- think about sharing the queue between configurations?  Then there won't be
  a big pause while the existing queue is flushed.  The hard part is syncing
  up the existing connections to the new ones; we have to worry about the
  send buffer.  On the other hand, normal close semantics has that all
  pending data is sent on the close.  If I were careful about only closing
  senders and not receivers, then I should be OK, right?

  That is, on a changeover, close all sending connections immediately, but
  not the receives.  When all connections are closed, allow the changeover.
  When the new configuration comes up, along with a pending send connection
  having the same name, associate the queue with it.  The problem is that
  the name might be different, ugh!  That is, we could be going down a
  different path, hence a different name.  Have to think about this some
  more.

- can see little humps in the data now, which I think is due to the pauses
  that occur at reconfig time.  At least, that's where the humps seem to
  occur.

- tracereceiver: can simulate buffering by extending the window for the
  bandwidth calculation.

- Now I send a SIGURG to the sender, who sends the flush packet, and then
  waits for the connection to close. Things then continue as before.  It
  appears that the problem is that somewhere down the line the flush packet
  is getting lost.  In particular, on the middle node of the line, the send
  connection is not being closed, implying that the third node is not
  closing its connection, either because it does not receive the flush
  packet, or because something else is wrong.  This only happens
  intermittently, unfortunately.

Research questions
------------------

- type system for CMN operations.  A simple system would simply be ensuring
  that input and outputs are well-typed (i.e. MPEG-stream, etc.).  A more
  advanced and interesting system would include enough information to
  support optimization.  For example:

  - whether components can be safely reordered (see Brian Smith's stuff).
  - whether components are stateful, and thus whether they can be relocated,
    and what its cost would be.
  - the costs of the operations, for use by the scheduler.  For example, we
    already have size attributes; what about computation and delay as well?
  - using type inference to reduce the annotation burden of the user.
  - a principled view of weaving for combining user and movie
    specifications.  What is the least amount of semantic work we need to do
    for this?

- Combining global and local scheduling, for these reasons:

  - reduce monitoring overhead
  - react more quickly to local phenomena
  - react effectively to global phenomena

  How do we change our specifications to capture that they will be locally
  scheduled?  At the moment, the requirements are fairly exact.  We'd want
  some kind of range to specify allowed operating conditions.  This range
  should then hint at the local schedulers as to how often to perform
  monitoring.

Bandwidth estimation
--------------------

inputs:
  last max reading (max,maxT)
  last min reading (min,minT)
  last estimate reading (est,estT)

The strongest information we have about the bandwidth is the max reading,
since it indicates when we couldn't send any more.  The second useful
reading is the amount we did send (min), and finally the estimate.

So we have some formula for the bandwidth:

Given p,q,r, we have

b/w = p*max + q*min + r*est;

These must be weighted by time since the last reading.  At time maxT, we
should have 

b/w = max

Therefore, we want something like

(1-M)*max + M*est

so that when the delta is 0, we get the max reading.  We also need to
incorporate the min reading so that if the max reading wasn't too long ago,
we give it more credence, but if M is 1, we give nearly full weight to the
estimate:

(1-M)max + M(M*est + (1-M)min)

We can here see that when M goes to 1, only the estimate is taken, but until
then, some of the minimum reading is also taken into account.  This assumes
that the estimate will always be >= max and min.

Also would like to scale the minimum reading so that it's given more weight?


Reconfigurations
----------------

  Problems with packet sending and reconfiguration:

    1) Any stored data on the sender that is preserved between
       configurations could screw up the receiver if the data format
       changes. Stored data could be in

       a) In transit between configurations**
       b) The socket send buffer in the kernel**
       c) The send queue in the application

       ** only applies if the socket is preserved across the reconfiguration
  
       Both b) and c) can be solved by dumping this data at reconfiguration
       time.  There is the problem of queueing in general, however (i.e. if
       some component was doing its own queueing and its state was
       preserved).  Problem a) is much harder to solve without some kind of
       versioning on the data formats.  That is, packets could have a
       version id on them that is checked by receivePacket, and tossed if of
       the wrong version.

    2) Conversely, throwing away stored data could also wreak havoc.  For
       example, the CT_Recv could have read a partial packet and be waiting
       for the rest.  If we reconfigure the sender to throw away the partial
       packet, and the receiver is not reconfigured fast enough, it could
       get bogus data.

  It might be possible to delay reconfigurations so as to synchronize the
  sender and the receiver.  That is, we could say that if the sender is
  sending packetized objects, then if the queuesize > 0, and the first
  element is a CharBuffer (implying a partial send), we must delay
  reconfiguring until we can send that CharBuffer.



Packetizing things
------------------

- A number of things to try:
 - i see your mpegsender already calculates the time between frame sends
    based on the framerate in the sequence header. a further improvement
    for each frame is: figure out how many packets it will take to send the
    frame, then divide your usec_frame_delay into that many smaller
    segments, and space out the frame fragments equally. this makes for
    smoother playout as the director can interleave receiving packets and
    writing to the client a little better

    I could see this helping when
    1) The medianet server is just doing forwarding.  That is, it treats the
       incoming packets as buffers and forwards them along as it receives
       them.  This should impose less overhead on IP because it won't have
       to break up packets into smaller pieces, and there will be less
       burstiness in the traffic stream, which would reduce the impact on
       our non-blocking I/O infrastructure.

    I don't see this helping when the MediaNet server is actually operating
    on the MPEG frames.  This is because it ends up having to read the whole
    frame in to process it anyway.

    On the other hand, we could figure out how to deal with partial packets
    so that the frame dropper could decide to drop the packet based on the
    header (just the first 20 bytes or so I think), and then the subsequent
    fragments could be dropped.  This will take some thinking; it sounds
    like it will require a component to do fragment handling.

    In order to take advantage of 1) above, we could have the mpgsender not
    packetize things and send buffers instead.  The packetizing would take
    place as a MediaNet component before the frame dropper.  Perhaps this
    would give us a way to deal with partial packets, as I'm thinking
    above.  What would be the cost of this approach?

    How it would work:
      Component state:
        char [] buf holds unprocessed frames
          (how big to make this?  It seems like we want to also make the
          receive buffer for the socket correspondingly smaller so as to
	  not effectively double our buffering size.)

    1) Component gets a charbuffer as input.
    2) It checks its own buffer cache first to see if there is a piece of
       a frame waiting.  If so, it searches through the buffer it has until
       it finds a start-marker, and then chops off the remainder for later
       processing.  It then combines the buf[] piece with this new piece
       into a PacketVector and sends it along.
    3) Now whatever it has from the input is either a full frame or a
       partial frame.  It checks the first byte to ensure it is a marker,
       and then searches for the end of the frame (i.e. the next start
       marker).  If it doesn't find one, then it buffers what it has.  If it
       does find one, it chops out the piece of the frame and sends it along
       as a Packet.  It then repeats this process 3) until only a partial
       frame remains.

    This approach seems poor since I'm going to be doing a lot of allocation
    and copying to break up the various pieces.  Alternatively, I could
    create a buffer abstraction that has a length field so that I can
    conceptually break things up, but then I have lots of worries about
    sharing, which I'm not sure how they will match up with an improved
    buffer management scheme using refcounts.  Perhaps refcounts will
    actually be what saves me?  I'll have to think about this more.

    Fundamentally, the sender knows something (the frame boundaries), and we
    should take advantage of this rather than forcing MediaNet to figure it
    out.  Therefore, we really want to be able to deal with partial
    packets for the greatest flexibility.

    How would this work?

    1) Add a PartialPacket(char ?,int) stream type that indicates a buffer
       and the valid length of that buffer (i.e. the remainder has not yet
       been read in).


Detecting problems
------------------

- Lots of disk access slows down the throughput (running updatedb
  while trying to receive data knocked me down from 80+ KB/s to 55 KB/s).

- Getting EWOULDBLOCK's or partial writes signifies the receiver is
  not acking the packets fast enough. Either the receiver is too overloaded
  to receive those packets, or the network is congested.  The scheduler
  will have to figure out which one is the case:

  - if the network is congested, it will need to reduce the traffic 
    stream or reroute it.
  - if the application is overwhelmed, it can reduce the traffic
    stream, or reduce the amount of computation required.

Allocation scheme
-----------------

Want to avoid the GC allocator as much as we can, so as not to pay the
cost of GC.  Possible reference counting scheme

1) Allocate a packet buffer, somehow (either from a buffer pool or,
   preferred less, by calling malloc).  It should have reference count 1.
   Hand it off to the component chain; for each component we pass it to, up
   the reference count, then subtract the initial count and lose the
   reference.

2) For each component that uses the buffer, keep track of the reference
   counts in the same way.  If the component drops the buffer, it needs to
   subtract the count.  Similarly do this with the write code in loop.cyc.
   If the count reaches zero, return the buffer to the generic pool.

3) In addition, add a priority field for packet queueing (this is what
   skbuffs do).  This way, when we get overloaded we can avoid dropping
   I-frames, P-frames, etc.  The priority field is initially uniform (say
   0), and then components can change it.  Therefore, we could 

     a) create a routine that checks the frame type and assigns the priority
     b) create a special component that detects frame types from packets
        and calls this routine.
     c) additionally call the routine from dropMPEGSeq and restoreMPEGSeq
        components to reproduce the priority.

NOTE: we really don't have a solid model for dealing with buffer sharing.
If we were to pass a buffer to two different components, the results could
be different depending on evaluation order (e.g. consider if we add a seqnum
in one direction and compress the buffer in the other), and definitely
different if we allow concurrency.  Therefore, having some kind of count is
necessary anyway.  In particular, if you do something other than just pass
the buffer along and the count >= 2, you must make a copy and lose your
count.  Perhaps I should be looking at skbuffs as a model of what to do.
This might kill two birds with one stone as far as the OS stuff goes.

Really this boils down to aliasing.  We want to control the aliases on the
buffer so as to be able to share it, and deallocate it (by returning it to
the buffer pool).  If we know there are aliases (i.e. the refcount > 1),
then we need to copy the buffer to change it.

Linearity is not what we want in general since we can't have "read-only"
aliases.  However, we need refcounts in this case, since we need to know
when to free the buffer, and to make sure we don't free it more than once.
Linearity would make this easier.  OTOH, if we know we have a linear chain,
it would nice if the type-system could help us out.  We could have linear
components, which only connect to linear components.  It would be even
better if components could "preserve linearity" which is to say that if an
incoming pointer is linear, then it stays linear as a result of executing
the function (i.e. it is not aliased or freed).  Would this work with
existentials (i.e. closures) ?

If we don't have this programming language support, we run the risk of
returning the buffer to the buffer pool when it is still in use, and thus
some other part of the application will start using it.  It also fails to
let us share memory between things of different types, whereas reference
counting works for memory in general (I think; the implementation may not).

How would this work with freeable regions?  I don't remember the proposal
well enough to apply it.  We would want to be able to a) know that a region
has aliases, so it shouldn't be freed, or b) attempt to free it and have the
freeing fail if there are aliases.  We want both cases to be cheap to
check.  We could then have a packet per dynamic region.  Of course, this
presumes that dynamic region creation is cheap.  Perhaps we really just want
reference counting support?  Look at skbuffs ...

Performance analysis

Looks like we get roughly 1/4 of the TCP throughput through bridged MediaNet
than we do via TTCP alone.  In particular I see about 70 KB/s TTCP through
medianet, but about 225 KB/s with TTCP alone.  Using gprof, it appears that
a lot of time is being spent in the garbage collector.  Therefore I'm
thinking I need to find ways to reduce calls to the GC.  The problem is
that, in general, I can't use region allocation because packets are used
asynchronously.  Therefore need to figure out ways to reduce calls to the GC
asynchronously.

Also find that the mpgsender app is getting throttled by the lack of
throughput of the medianet server on the same host.  This is not good, as we
want the medianet server to detect the bottleneck so it can report it to the
global scheduler.

======================================================================

Hurt by null-term strings
-------------------------

1) Can't resize them very well.  That is, we want to have some kind of
   realloc that works for both null-terminated and non-null-terminated
   strings.  I guess we can use realloc_str for the former, at the moment.

2) It either is a buffer or it isn't, except when const.  It would be nice
   to be able to treat stuff as a buffer temporarily, and then reverify the
   null-termination property.  Need to track aliasing, though.  How hard
   could this be?  Couldn't we add a "noalias" attribute to a pointer so
   that we can tell it isn't aliased?  We could be very conservative about
   this---make sure you don't store it in a non-local.

3) One place some kind of polymorphism would help is add_to_buf.  I want to
   be able to grow a buffer that is null terminated, and preserve that
   invariant, but also do the same if it is not null-terminated, without
   having to rewrite the function.  What I'd really like is a function that
   says:

     if it's null-terminated, it will stay that way
     if not, no worries.

   I'm thinking this is the polymorphism that Greg is talking about.
   I.e. the "possibly-null-terminated" type.

======================================================================

On-the-fly component upgrades

Currently throw away current components and replace them with new
components.  Refinements:

1) When a replacement component has the same name and operation as the old
   component, then the state needs to be propogated between them.  Thus need
   a component method for grabbing state and initializing with grabbed
   state.  Only the op name needs to be the same; attributes are captured
   in the state that performs the initialization.

   Need to distinguish between the state in the current component (like
   the cached sequence header, or the queue of messages), and its
   initialization attributes.  During updates, the initialization 
   attributes would change while the state would remain.

   We might like to relax the constraint that the op must be the same, and
   instead have a family of components that can initialize using the other's
   state.  Not sure how this would work exactly.  Instead maybe we should
   make the op name correspond with the needed state, and have behavior
   vary based on attributes.

   How will the types for this work out?

2) Rather than create a new component using the old state, just alter the
   current component with the new attributes.
  
3) There is also state associated with a stream that we may need.  In
   particular, the sequence header for the MPEG stream.  At the moment, this
   is needed by the "restore" component for calculating the size of the
   dummy frame.  If the restore compnent is removed entirely at some point,
   but then reinserted later, it will need to know the sequence header
   information.  A number of possibilities:

   a) We allow the data source to be queried out-of-band to acquire the
      information needed to obtain the dead frame.  It appears we only need
      to know the frame size for this to work.  Where should the query
      go?  Perhaps the global scheduler should know this information
      and just tell the component.


======================================================================
We have a problem with transformations that change the "type" of streams.
Either the components have to be stream-polymorphic (i.e. don't look at its
data), or there needs to be some runtime information that confirms the
type.  This runtime information needs to accompany the data when it's
marshalled so that it can be reconstructed.

We'd like the "types" of in and outports to reveal when this typing
information is present.  For example:

	      SeqAdd			FrameDrop
MPEG frame seq ---> (int,MPEG frame) seq ---> (int,MPEG frame) partial seq 

FrameRestoreSeq		SeqStrip
---> (int,MPEG frame) seq ---> MPEG frame seq 


However, we would like it for the FrameDrop component to work regardless of
whether there is a sequence number attached or not.  That is, it should be
abstract with respect to acquiring the frame data that it needs to drop the
frame.  This way, we could do frame restoring by timer, rather than by
sequence number:

	      FrameDrop			FrameRestoreTimer
MPEG frame seq ---> MPEG frame partial seq ---> MPEG frame seq

This representation independence would imply that the data needs to come
packaged with its own methods; i.e. it is an object.  We could think of a
declarative subtyping relationship using multiple inheritance:

MPEGframe    Seq
   |        /
MPEGframeSeq

Here, MPEGframe would define some method "getFrameType", and Seq would
define a method "getSeq."  However, we want "implementation-polymorphism."
That is, components should have types like:

SeqAdd: `a -> `a implements Seq
FrameDrop: MPEGframe -> MPEGframe option
FrameRestoreSeq: MPEGframe implements Seq -> MPEGframe implements Seq list
FrameRestoreTimer: MPEGframe -> MPEGframe list
SeqStrip: `a implements Seq -> `a

There is an "implements" subtyping relationship whereby you have

`a implements I <= `a

The key is that FrameDrop does

We have to have serious representation independence for this to work;
perhaps basically method lookup by name.  Plus, the on-wire marshalling cost
might be high.


======================================================================

big questions:

What are we going to demo?  The goal is to have a cool demo by April.
Furthermore we'd like to get a paper out the door by then.  What is a
deadline we should set for ourselves?  How are we going to set up the
implementation so we can make progress?  Can we get another hacker involved
if Robbert can't help too much with the coding part?

What do we want to say/show?  What is our primary contribution over other
systems that we want to illustrate?

We should come up with some compelling computation, code up a specification
in mediaNet XML specs, and then work to get it implemented.

Seems to me that we want to work on some kind of "hierarchy of schedulers"
idea that capitalizes on the local/global distinction.  What can this be?
Certainly we will want to think about:

1) Scalability
   Impediments:
   a) monitoring overhead
   b) message exchange overhead (i.e. between global/local scheduler)
   c) disconnected operation/lost messages.  What happens if the local guys
   can't communicate with the global scheduler?  Perhaps we should think of
   its directions as value-added?  That is, the local scheduler should be
   able to deal intelligently when it's on its own.

2) Responsiveness
   Need to be able to react to changes reasonably quickly

3) Generality
   How far can we take it beyond just media streams (i.e. JBI) ?

Whatever it is, we need to develop some infrastructure:

1) A way to generate metrics at the local host to send back information to
   the global scheduler.  Ideas:
   a) missed deadlines.  The local scheduler can keep track of deadlines
   that it misses and send back statistics.  These could be parameterized by
   stream.
   b) CPU/bandwidth info.  We could either keep track of stuff inside the
   scheduled process in terms of its own bandwidth, or we could have some
   outside "sniffer" process that keeps track of things and sends the
   information.  Either way we want monitoring to be low overhead.  How to
   do this?
   c) application-specific metrics.  For example, we could keep track of
   some kind of playback quality and report that back.

2) A means to communicate between the global/local schedulers that is low
   overhead.  

================>   Finally---what's the plan?!???!!


Fixes

Refinements of configurations:

1) Each SEND/RECV configuration should have attributes associated with it.
   In particular:
   a) connection-type:  This could be either TCP, UDP, or HTTP.
   c) flow-type: This could be PACKET or STREAM.  Basically the former
      indicates that each received packet is a significant entity (i.e.
      a frame), where the latter indicates that significant entities may
      consist of one or more packets.
   d) Would like some kind of qualifier that also indicates packets
      may have sequence numbers.

2) Inputs/outputs of components should have an associated type that
   describes the data that they expect.  In particular, we now define
   MPEG as this data type.

   Furthermore, these inputs and outputs should also indicate a
   flow-type, as defined above.  When flow-type is PACKET, then each
   invocation of the scheduler/inport will be provided a full frame.
   Otherwise, it could be partial data, and the component itself
   needs to worry about constructing frames from input data.

With these refinements, a certain level of consistency checking can be
performed to ensure that components line up properly.  We can also define
the following components:

1) MPEG-framer: this takes an MPEG STREAM as input and transforms it
   into an MPEG PACKET output.
2) MPEG-dropper: takes an MPEG PACKET flow as input and returns a
   packet flow as output.

   Question: how to deal with rate issues?  Perhaps they will take
   care of themselves, as the inputs will already be at the proper
   rate, and the dropped frames will just not be sent.

   Question: what do deadlines in the middle of the network really buy
   you if the source is already sending data at the correct rate?  It
   seems like it will be wasted work, and packets should just send
   things as quick as they can.  Therefore need to make forwarding
   as cheap as possible.  I.e. no copies, fast path, etc.

---

I feel like I really need to reorganize the code.  It seems pretty baroque.
Stuff to do

1) Can I replace the use of existentials with xtunions instead?  Can
   I use regular parametric polymorphism?
2) I like the organization of the throttleproxy and httpreceiver code.
   Can I apply this to the media-net server?
3) Change the scheduling stuff to be cleaner.  Think about how
   to be more passive and use timeouts less.  That is, I want to be
   smarter about invoking a component on a regular basis when there is
   no work to be done, because e.g. packets haven't arrived.

   That said, there are clearly periodic events that we want to deal with.

4) Deal with sends better.  Right now, unfinished sends don't work
   very well.  Furthermore, HTTP responses are not well integrated
   into the top-level loop.  These could both be set up in some kind
   of send queue which is serviced in the top-level loop.  As an
   optimization, we could do the send locally, and if it didn't complete,
   then stick the rest in the send queue.

   How can we eliminate the amount of copying that has to happen here?

5) Can we set things up in an event-queue organization?  How would this
   help?

===========================================================================

Web notes:
With media player, when getting the file it does:

128.84.99.23 - - [03/Dec/2001:17:34:38 -0500] "GET /f14.mpg HTTP/1.1" 200 62473
128.84.99.23 - - [03/Dec/2001:17:34:42 -0500] "GET /f14.mpg HTTP/1.1" 206 2878607

Need to figure out what's going on here, and perhaps look at the actual
connection data.  Perhaps what's happening is that the media player closes
the connection before all of the data is received and then tries to get the
rest from within the middle of the file ... ?

Another try gives me exactly the same information.  Perhaps the GET request
is specifically asking for only a part of the file, rather than all of it,
and that's what's resulting in the complaint ... ?





Situations:

1) May imagine receivers joining an existing stream at any time.  Therefore
need to cache the sequence header so that the receiver can properly set
itself up to receive the remaining frames.

===========================================================================

Right now, each component has a computation and some kind and a deadline
associated with it.  The deadlines (based on rate) are independent
of the stream and the computation, and the frames are assumed to be the same
size.

We have to refine these assumptions as follows:

1) MPEG streams have a fixed number of plausible "alternate" rates based
   just on frame dropping.  These rates depend on the initial rate and the
   pattern in the stream.  They are:

   a) Full steam (i.e. 30 fps, 24 fps, etc.).  The full rate needs to be
   communicated by the source to the global scheduler.
   b) No B frames.
   c) No P frames.
   d) Fewer I frames.

   For example, if we have the stream
   IBBPBBPBBPBBPBB ...
   where R is the full-steam rate, then we have possible rates:

   R, R/3, R/(16*n) for n>=1 (i.e. 30 fps, 10 fps, 1.8 fps, 0.9 fps, ...)

2) We can set rates in between these natural rates by doing re-encoding of
   the stream as I-frames (I think).  In the worst case, we could re-encode
   every frame, at a cost of both computation and bandwidth.  Instead, we
   would prefer to reuse as many of the existing frames as possible, and do
   selective re-encoding.

3) Each frame is not the same size, nor is the size of each frame type
   constant.  For example, here are some stats:

   F14 video:
     I: avg 13k
     P: avg 6.8k
     B: avg 3.2k

4) There is stream metadata that cannot ever be dropped.  For MPEG, this
   includes so-called GOP headers.  In our system, these headers can either
   accompany the first frame in the pattern (they are only 8 bytes long), or
   they can be sent separately.  If they are sent separately, they must not
   be dropped, say by a generic rate controller.

As a result of these details, we can look at network computations a bit
differently:

1) Rather than associate a rate with an operation, we associate it with a
   stream, so that compute-node operations become functions from streams to
   streams (where the rate of the input and output streams can be
   different).  As such, we can implement a generic or specialized "rate
   change" computation explicitly.

2) We can look at two sorts of polymorphism concerning stream computations:
   a) stream polymorphism.  Here, the contents of the stream are not
      important to the computation.  A generic rate controller that dropped
      every other frame would be stream polymorphic.
   b) rate polymorphism.  Here, computations may be
      stream-contents-specific, but rate-independent.  For example, a color
      to b/w transcoder for MPEG streams would be rate-independent (same
      rate in as the rate out).

With regard to MPEG:

1) The scheduler must be aware of the stream pattern, as well as the
   approximate sizes of the frames, and it needs to be notified if either
   changes.

2) We can approximate an arbitrary user rate request by using two rate
   controllers.   First, we use a "drop-only" controller at an arbitrary
   point in the network, to save bandwidth.  Then, at the end-host, we can
   do re-encoding to drop the last bit.  However, this seems unlikely to be
   helpful, since we imagine that resources are plentiful at the end host.

--------------------------------------------------------------------------

Go through the stream and grab out each of the pieces.  Seems like we can,
at each decision point, try to parse the plausible things until we get a
hit, then to a "tell" to figure how big the piece is, then wrap up the piece
as a packet and send it out.  The drawback is that we then pay the parsing
cost (it sticks it in a struct, etc.); I wonder what might work better.
Need to tag frames based on their importance to the stream:
- those that cannot be dropped.  This means we can't stick them in a buffer
  that could be overwritten if we have a smaller rate than the sender.
  Seems like we could queue those things that can't be dropped, and then at
  each iteration flush the queue.
- those that can be dropped.
  - There is a prioritization scheme here.  We want to drop B frames before
    P frames before I frames.  Again, we don't want stuff with lower
    priorities to overwrite stuff with higher priority.  Perhaps the idea is
    to be smart about extracting the frame at the deadline---check the
    higher priority queue first then, if nothing is there, check the lower
    priority and so on.

Suggests a generic way of dealing with stream frames:
- tag with drop/nodrop.  The nodrop frames are stuck in one queue that is
  flushed on each frame
- tag with priority for the drop frames.  Stick these in a priority queue,
  and at each point dequeue and send the first frame and dump the rest.

Issue---need to deal with temporal ordering of nodrop/drop frames so that
the stuff we send out preserves the order of the original stream.  How to
create a causal ordering between frames so that the flushing at each
deadline can be fast?

  GOP I P P I B GOP I P P I
  <----------------->

  nodrop: GOP, GOP
  prio  : I I I I P P B

  Ick.  Don't want to send the first GOP header if we have dropped all of
  its frames.  However, we do want to send it if we send one of its frames
  (i.e. if the window went only up to that header).  Thus we need to be even
  smarter about encoding causality.

Question---do we attempt to make some general scheme that may or may not be
appropriate for other stream formats (i.e. AVI), or do we just specialize
the MPEG stuff?  Looks like, at first glance, AVI is just a series of
frames, without any headers or anything in between; this arrangement works
for the regular frame rate stuff.  Is this true for all codecs (what is a
codec?) ?

Issue---we can't decode B frames immediately upon receiving them, since we
have to wait for downstream frames.  This implies that we need to buffer at
the client, introducing an extra delay.  Is this acceptable?  We might send
batches of frames as a single frame at a quicker rate, and then introduce
more lag in other frames.  Thus the aggregate rate would be correct, but the
per-frame rate would jitter.  The client app would have to properly delay
display in this case as well.

--------------------------------------------------------------------------

At the end, we have a list of fd's and their associated components.  Will
need to initialize the fd table for loop with this info.

How to do send and recieve components:

. Have a list of file descriptors that we are checking for input.
  . List of descriptors that are associated with Recv components
  . List of listen sockets waiting for connections from remote servers;
    each is also associated with a recieve component.
  . List of connection sockets waiting for remote servers; each
    is associated with a 
  . HTTP listen socket (always open)
  . HTTP connection sockets.

. Following select().
  . If a Recv socket has data available, read it in.  If a full
    frame has been received, call the associated read component's inport
    function.  (Optimization: if the Recv component is not scheduled, and
    there is only one outport port, call the downstream component
    directly).
  . If a Listen socket has data available, accept the connection.
    Then create an association between the connection fd and the
    Recv component waiting on it, and stick the new fd in the
    Recv list.  Close the listen socket and remove it from the list.
  . If a connection socket has a connection available, accept the
    connection.  Then update the send component's file descriptor
    to contain the connection's fd.  This will effectively
    activate the component.
  . If the HTTP listen socket has data available, accept the connection
    and store it in the list.
  . If the HTTP conn socket has data, read it in and store it.  When
    the connection is actually closed (meaning all the data has been totally
    sent), parse the XML data and re-do the configuration.

So ...

. Following configuration we need to return back
  . A list of recv (actually "forward") components along with
    port numbers to wait for connections.
  . A list of uninitialized send components along with
    host/port pairs to set up connections with, and a pointer to 
    address in the send where the set up file descriptor should be
    stored.
    
--- Loop Pseudocode ---

tick = gettime
while (1) {
  nextdl = scheduleComponents(tick);
  while (tick < nextdl)
    select(); // non-blocking
    if (fd's have data) {
      invoke each fd's handler:
        recvhandler
        listenhandler
        connhandler
        httplistenhandler
        httpconnhandler
    }
    tick = gettime;
  }
}

----

(recv port.rate ---> send host.port.rate) :

if recv.rate == N && send.rate == N ||
   recv.rate == 0 || send.rate == 0 then
  create a "Queue component" to go in between them at rate N

if recv.rate == N && send.rate == M then
  create a "Queue component" for each at the differing rates.
  (can do this in general)

