Multicamera networks are becoming complex involving larger sensing areas in order to capture activities and behavior that evolve over long spatial and temporal windows. This necessitates novel methods to process the information sensed by the network and visualize it for an end user. In this paper, we describe a system for modeling and on-demand visualization of activities of groups of humans. Using the prior knowledge of the 3D structure of the scene as well as camera calibration, the system localizes humans as they navigate the scene. Activities of interest are detected by matching models of these activities learnt a priori against the multiview observations. The trajectories and the activity index for each individual summarize the dynamic content of the scene. These are used to render the scene with virtual 3D human models that mimic the observed activities of real humans. In particular, the rendering framework is designed to handle large displays with a cluster of GPUs as well as reduce the cognitive dissonance by rendering realistic weather effects and illumination. We envision use of this system for immersive visualization as well as summarization of videos that capture group behavior.