To see a listing with abstracts
To see a listing without abstracts
You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format. However, this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the Department of Computer Science of the University of Maryland at College Park under terms that include this permission. All other rights are reserved by the author(s).
On the communication-storage minimization for a class of secure. Radha Poovendran. July 2000.
Developing cryptographic key management protocols that have scalability in terms of the key storage as well as key update communication is an important problem in many secure multicast applications~\cite{rb1,dea,wgl}. Wong {\em et al.}~\cite{wgl} and Wallner {\em et al.}~\cite{dea} independently presented the first set of key distribution models where the key update communication grows as ${\cal O}(\log N)$ for group of size $N$. However, the storage requirement of these models were ${\cal O(N)$. Recently~\cite{cmn}, a new model based on clustering of the group members was proposed in order to lower the key storage while maintaining the update communication growth as ${\cal O}(\log N)$. For the new model, by considering the product of the storage and the communication as the cost function, the optimal cluster size $M$ was conjectured to be $M= {\cal O}(\log N)$. In this paper, we show that the optimal value of the cluster can be computed without the product function due the monotonicity of the storage with respect to the cluster size. We show that the optimal cluster size selection of the model in~\cite{cmn} can be formulated as a constraint optimization problem, and then transform it to a fixed point equation of the form $M - \lambda \log_e M = (\beta_2 - \lambda)\log_e N$, where $\beta_2, \lambda$ are model parameters. We first show that the largest root of this equation is the optimal solution, and then compute it by two different techniques. We then show that the first order approximation of the solution is of the form $M \approx (\beta_2 -\lambda)\log_e N + \lambda \log_e \log_e N$, leading to $M \approx (\beta_2 - \lambda) \log_e N$ for large values of $N$. We make a case for use of the estimate $M = (\beta_2 -\lambda) \log_e N + \lambda \log_e \log_e N$ instead of $M = \log_e N$ by showing that even for group size up to $2^{32}$, the value $M = \log_e N + \lambda \log_e \log_e N$ provides significantly lower value of key storage compared to the value $M = \log_e N$. We also show that the best estimate of $M$ using the product function in~\cite{cmn} does not exceed $M = \nu \log_e N$ for a constant $\nu$. (Also cross-referenced as UMIACS-TR-2000-58) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance and Analysis of Saddle Point Preconditioners for the. Howard C. Elman. David J. Silvester. Andrew J. Wathen. July 2000.
We examine the convergence characteristics of iterative methods based on a new preconditioning operator for solving the linear systems arising from discretization and linearization of the steady-state Navier-Stokes equations. With a combination of analytic and empirical results, we study the effects of fundamental parameters on convergence. We demonstrate that the preconditioned problem has an eigenvalue distribution consisting of a tightly clustered set together with a small number of outliers. The structure of these distributions is independent of the discretization mesh size, but the cardinality of the set of outliers increases slowly as the viscosity becomes smaller. These characteristics are directly correlated with the convergence properties of iterative solvers. (Also cross-refernced as UMIACS-TR-2000-54) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Rate Windows for Efficient Network and I/O Throttling. Kyung D. Ryu. Jeffrey K. Hollingsworth. Peter J. Keleher. July 2000.
This paper proposes and evaluates a new mechanism for I/O and network rate policing. The goal of the proposed system is to provide an simple, yet effective way to enforce resource limits on target classes of jobs in a system. The basic approach is useful for several types of systems including running background jobs on idle workstations, and providing resource limits on network intensive applications such as virtual web server hosting. Our approach is quite simple, we use a sliding window average of recent events to compute the average rate for a target resource. The assigned limit is enforced by forcing application processes to sleep when they issue requests that would bring their resource utilization out of the allowable profile. Our experimental results that show that we are able to provide the target resource limitations within a few percent, and do so with no measurable slowdown of the overall system. (Also cross-referenced as UMIACS-TR-2000-53) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Image Restoration through Subimages and Confidence Images. James G. Nagy. Dianne P. O'Leary. July 2000.
Some very effective but expensive image reconstruction algorithms cannot be applied to large images because of their cost. In this work, we first show how to apply such algorithms to subimages, giving improved reconstruction of regions of interest. Our second contribution is to construct confidence intervals for pixel values, by generalizing a theorem of O'Leary and Rust to allow both upper and lower bounds on variables. All current algorithms for image deblurring or deconvolution output an image. This provides an estimated value for each pixel in the image. What is lacking is an estimate of the statistical confidence that we can have in those pixel values or in the features they form in the image. There are two obstacles in determining confidence intervals for pixel values: first, the process is computationally quite intensive, and second, there has been no proposal for providing the results in a visually useful way. In this work we overcome the first of those limitations and use a recently developed algorithm called {\sf Twinkle} to overcome the second. We demonstrate the usefulness of these techniques on astronomical and motion-blurred images. (Also cross-referenced as UMIACS-TR-2000-52) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Displaying Confidence Images. James G. Nagy. Dianne P. O'Leary. July 2000.
Algorithms for computing images result in an estimate of an image. The image may result from deblurring a measured image, from deconvolving a set of measurements, or from computing an image by modeling physical processes such as the weather. These computations provide an estimated value for each pixel in the image. What is lacking, however, is an estimate of the statistical confidence that we can have in those pixel values or in the features they form. In this work we discuss novel ways to display confidence information, using an algorithm called {\sf Twinkle}, in order to give the viewer valuable visual insight into uncertainties. The technique is useful whether the confidence information is in the form of a confidence interval or a distribution of possible values. We demonstrate how to display confidence information in a variety of applications: weather forecasts, intensity of a star, and rating a potential tumor in a diagnostic image. (Also cross-referenced as UMIACS-TR-2000-51) Universty of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
AN ANALYSIS OF SMOOTHING EFFECTS OF UPWINDING STRATEGIES FOR THE. HOWARD C. ELMAN. ALISON RAMAGE. June 2000.
Using a technique for constructing analytic expressions for discrete solutions to the convection-diffusion equation, we examine and characterise the effects of upwinding strategies on solution quality. In particular, for grid-aligned flow and discretisation based on bilinear finite elements with streamline upwinding, we show precisely how the amount of upwinding included in the discrete operator affects solution oscillations and accuracy when boundary layers are present. In addition, we show that the same analytic techniques provide insight into other discretisations, such as a finite difference method that incorporates streamline diffusion, and the isotropic artificial diffusion method. (Also cross-referenced as UMIACS-TR-2000-50) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
SHOP and M-SHOP: Planning with Ordered Task Decomposition. Dana Nau. Yue Cao. Amnon Lotem. Hector Munoz-Avila. June 2000.
SHOP (Simple Hierarchical Ordered Planner) and M-SHOP (Multi-task-list SHOP) are planning algorithms with the following characteristics. * SHOP and M-SHOP plan for tasks in the same order that they will later be executed. This avoids some task-interaction issues that arise in other HTN planners, making the planning algorithms relatively simple. This also makes it easy to prove soundness and completeness results. * Since SHOP and M-SHOP know the complete world-state at each step of the planning process, they can use highly expressive domain representations. For example, they can do planning problems that require Horn-clause inferencing, complex numeric computations, and calls to external programs. * In our tests, SHOP and M-SHOP were several orders of magnitude faster than Blackbox, IPP, and UMCP, and were several times as fast as TLplan. * The approach is powerful enough to be used in complex real-world planning problems. For example, we are using a Java implementation of SHOP as part of the HICAP plan-authoring system for Noncombatant Evacuation Operations (NEOs). In this paper, we describe SHOP and M-SHOP, present soundness and completeness results for them, and compare them experimentally to Blackbox, IPP, TLplan, and UMCP. The results suggest that planners that generate totally ordered plans starting from the initial state can "scale up" to complex planning problems better than planners that use partially ordered plans. Department of Computer Science, University of Maryland,
NTCIR CLIR Experiments at the University of Maryland. Douglas W. Oard. Jianqiang Wang. June 2000.
This paper presents results for the Japanese/English cross-language informaiton retrieval task on teh NACSIS Test Collection. Two automatic dictionary-based query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required description only queries and that use of the first translation in the edict dictionary is comparable with the use of every translation. Japanese term segmentation posed no unusual problems, which contrasts sharply with results previously obtained for corss-language retrieval between Chinese and English. (Also cross-referenced as UMIACS-TR-2000-47, LAMP-TR-054) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
TREC-8 Experiments at Maryland: CLIR, QA and Routing. Douglas W. Oard. Jianqiang Wang. Dekang Lin. Ian Soboroff. June 2000.
The University of Maryland team participated in four aspects of TREC-8: the ad hoc retrieval task, the main task in the cross-language retrieval (CLIR) track, the question answering track, and the routing task in the filtering track. The CLIR method was based on Pirkola's method for Dictionary-based Query Translation, using freely available dictionaries. Broad-coverage parsing and rule-based matching was used for question answering. Routing was performed using Latent Semantic Indexing in profile space. (Also cross-referenced as UMIACS-TR-2000-46, LAMP-TR-053) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Structured Translation for Cross-Language Information Retrieval. Ruth Sperer. Douglas W. Oard. June 2000.
The paper introduces a query translation model that reflects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system first determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval effectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications. (Also cross-referenced as UMIACS-TR-2000-45, LAMP-TR-052) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Mining the Web for Bilingual Text. P. Resnik. June 2000.
STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. The most recent end-product is an automatically acquired parallel corpus comprising 2491 English-French document pairs, approximately 1.5 million words per language. (Also cross-referenced as UMIACS-TR-2000-44) (Also cross-referenced as LAMP-TR-051) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Evaluating Lexicon Coverage for Cross-Language Information Retrieval. G. Levow. D.W. Oard. June 2000.
No abstract available (Also cross-referenced as UMIACS-TR-2000-43) (Also cross-referenced as LAMP-TR-050) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Signal Boosting for Translingual Topic Tracking: Document Expansion and. G. Levow and D.W. Oard. June 2000.
No abstract available (Also cross-referenced as UMIACS-TR-2000-42) (Also cross-referenced as LAMP-TR-049) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Statistical Word-Level Translation Model for Comparable Corpora. Mona Diab. Steve Finch. June 2000.
In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance. (Also cross-referenced as UMIACS-TR-2000-41, LAMP-TR-048) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Philip Resnik. Mona Diab. June 2000.
The way we model semantic similarity is closely tied to our understanding of linguistic representations. We present several models of semantic similarity, based on differing representational assumptions, and investigate their properties via comparison with human ratings of verb similarity. The results offer insight into the bases for human similarity judgments and provide a testbed for further investigation of the interactions among syn tactic properties, semantic structure, and semantic con tent. (Also cross-referenced as UMIACS-TR-2000-40, LAMP-TR-047) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classification. Mona Diab. John Schuster. Peter Bock. June 2000.
Quantitative analysis of literary style has heretofore utilized semantic elements-word counts. This research attempts to identify quantifiable syntactic elements of style that can be used for author identification. The measurement of syntactic elements utilizes a dictionary with one part of speech per word and looks at phrases delimited by punctuation marks. Different size permutations of words - referred to as grams - are counted within each text. Correlations are measured amongst the gram frequencies of eight texts pertaining to four authors, both contemporary and non-contemporary. The correlations are performed across different gram sizes of words. The same treatment is applied to a target text, the Funeral Elegy text. The approach holds for classifying texts temporally consistently across the various gram sizes. Yet a finer grained investigation is required to certify the authorship of the Funeral Elegy text. (Also cross-referenced as UMIACS-TR-2000-39, LAMP-TR-046) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Quantifying and Interpreting the Effect of Intelligent Information. Terry P. Riopka. Mona Diab. Peter Bock. June 2000.
A genetic algorithm is simulated using human beings as "chromosomes" in a preliminary study intended to quantify and interpret the effect of intelligent information exchange on genetic algorithm performance. Two factors are varied: the amount of information supplied to the cohort and the type of data manipulation allowed during the exchange. A human simulated genetic algorithm is run for each combination of factors as well as a machine simulation for comparison. Qualitative analysis of recorded conversations indicate extensive use of memory and development of block biases during genetic algorithm evolution. Informal analysis shows that genetic algorithm simulations using complex data manipulations combined with exact knowledge of string fitnesses seem to out-perform a standard machine implementation for the given optimization fitness function. Interestingly, polar combinations: simple data manipulation/minimum information and complex data manipulation/maximum information simulations seem to out-perform other combinations. (Also cross-referenced as UMIACS-TR-2000-38, LAMP-TR-045) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chinese-English Semantic Resource Construction. Bonnie J. Dorr. Gina-Anne Levow. Dekang Lin. Scott Thomas. June 2000.
We describe an approach to large-scale construction of a semantic lexicon for Chinese verbs. We leverage off of three existing resources--a classification of English verbs called EVCA (English Verbs Classes and Alterations) [Levin, 1993], a Chinese conceptual database called HowNet [Zhendong, 1988c, Zhendong, 1988b] (http://www.how-net.com), and a large machine-readable dictionary called Optilex. The resulting lexicon is used for determining appropriate word senses in applications such as machine translation and cross-language information retrieval. (Also cross-referenced as UMIACS-TR-2000-27) (Also cross-referenced as LAMP-TR-044) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Construction of Chinese-English Semantic Hierarchy for Information. Gina-Anne Levow. Bonnie Dorr. Dekang Lin. June 2000.
This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging off of an existing Chinese conceptual database called HowNet and a Levin-based English verb classification, we use thematic-role information to create links between Chinese concepts and English classes. The resulting hierarchy is used for multilingual lexicons in an English-Chinese cross-language information retrieval application. We demonstrate a structured syntax interface that exploits this large-scale hierarchy and its linkages to WordNet for English-Chinese cross-language information retrieval. (Also cross-referenced asUMIACS-TR-2000-36) (Also cross-referenced as LAMP-TR-043) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
oxyGen: A Language Independent Linearization Engine. Nizar Habash. May 2000.
This paper describes a language independent linearization engine, oxyGen. This system compiles target language grammars into programs that take feature graphs as inputs and generate word lattices that can be passed along to the statistical extraction module of the generation system Nitrogen. The grammars are written using a flexible and powerful language, oxyL, that has the power of a programming language but focuses on natural language realization. This engine have been used successfully in creating an English linearization program that is currently used as part of a Chinese-English machine translation system. (Also cross-referenced as UMIACS-TR-2000-35) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hashing Moving Objects. Zhexuan Song. Nick Roussopoulos. June 2000.
In real-life applications, the objects are both spatial and temporal referenced. The objects which continuously change their location are called moving objects. With the development of wireless communication and positioning technology, it becomes necessary to store and index those objects in database. Due to the complexity of the problem, many pure spatial index structures are unable to index large volume of moving objects in database. In this paper, we propose a whole new idea based on hashing technique. Since it is impossible to re-index all the objects after each time period, we store the objects in buckets. When an object moves within a bucket, the database does not make any change. By using this technique, the number of database update is greatly reduced which makes the index procedure feasible. Then, we extend the previous system structure by introducing a filter layer between the position information collectors and the database. Also four different methods based on the new system structure are presented. Performance experiments were performed to evaluate different aspects of our indexing techniques, and the conclusions are included in the paper. Department of Computer Science, University of Maryland
Broadening Access to Large Online Databases by Generalizing Query. E. Tanin. C. Plaisant. B. Shneiderman. May 2000.
Companies, government agencies, and other types of organizations are making their large databases available to the world over the Internet. Current database front-ends do not give users information about the distribution of data. This leads many users to waste time and network resources posing queries that have either zero-hit or mega-hit result sets. Query previews form a novel visual approach for browsing large databases. Query previews supply data distribution information about the database that is being searched and give continuous feedback about the size of the result set for the query as it is being formed. On the other hand, query previews use only a few pre-selected attributes of the database. The distribution information is displayed only on these attributes. Unfortunately, many databases are formed of numerous relations and attributes. This paper introduces a generalization of query previews. We allow users to browse all of the relations and attributes of a database using a hierarchical browser. Any of the attributes can be used to display the distribution information, making query previews applicable to many public online databases. (Also cross-referenced as UMIACS-TR-2000-32) (Also cross-referenced as HCIL-TR-2000-14) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Fisheye Menus. B. B. Bederson. May 2000.
We introduce "fisheye menus" which apply traditional fisheye graphical visualization techniques to linear menus. This provides for an efficient mechanism to select items from long menus, which are becoming more common as menus are used to select data items in, for example, e-commerce applications. Fisheye menus dynamically change the size of menu items to provide a focus area around the mouse pointer. This makes it possible to present the entire menu on a single screen without requiring buttons, scrollbars, or hierarchies. A pilot study with 10 users compared user preference of fisheye menus with traditional pull-down menus that use scrolling arrows, scrollbars, and hierarchies. Users preferred the fisheye menus for browsing tasks, and hierarchical menus for goal-directed tasks. (Also cross-referenced as UMIACS-TR-2000-31) (Also cross-referenced as HCIL-TR-2000-12) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Jazz: An Extensible Zoomable User Interface Graphics ToolKit in Java. B. B. Bederson. J. Meyer. L. Good. May 2000.
In this paper we investigate the use of scene graphs as a general approach for implementing two-dimensional (2D) graphical applications, and in particular Zoomable User Interfaces (ZUIs). Scene graphs are typically found in three-dimensional (3D) graphics packages such as Sun's Java3D and SGI's OpenInventor. They have not been widely adopted by 2D graphical user interface toolkits. To explore the effectiveness of scene graph techniques, we have developed Jazz, a general-purpose 2D scene graph toolkit. Jazz is implemented in Java using Java2D, and runs on all platforms that support Java 2. This paper describes Jazz and the lessons we learned using Jazz for ZUIs. It also discusses how 2D scene graphs can be applied to other application areas. (also cross-referenced as UMIACS-TR-2000-30) (Also cross-referenced as HCIL-TR-2000-13) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
User Modeling for Information Access Based on Implicit Feedback. J. Kim. D. W. Oard. K. Romanik. May 2000.
User modeling can be used in information filtering and retrieval systems to improve the representation of a users information needs. User models can be constructed by hand, or learned automatically based on feedback provided by the user about the relevance of documents that they have examined. By observing user behavior, it is possible to infer implicit feedback without requiring explicit relevance judgments. Previous studies based on Internet discussion groups (USENET news) have shown reading time to be a useful source of implicit feedback for predicting a users preferences. The study reported in this paper extends that work by providing framework for considering alternative sources of implicit feedback, examining whether reading time is useful for predicting a users preferences for academic and professional journal articles, and exploring whether retention behavior can usefully augment the information that reading time provides. Two user studies were conducted in which undergraduate students examined articles and abstracts related to the telecommunications and pharmaceutical industries. The results showed that reading time could be used to predict the users assessment of relevance, although reading time for journal articles and technical abstracts are longer than has been reported for USENET news documents. Observation of printing events, a type of retention behavior, was found to provide additional useful evidence about relevance beyond that which could be inferred from reading time. The paper concludes with a brief discussion of the implications of the reported results. (Also cross-referenced as UMIACS-TR-2000-29) (Also cross-referenced as HCIL-TR-2000-11) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Navigational Issues in the Design of On-Line Self-Administered. K. L. Norman. Z. Friedman. K. Norman. R. Stevenson. May 2000.
Answering questions on surveys involves the access of internal cognitive knowledge structures, the retrieval of records from external data-bases, and the navigation of items on the computer interface. In this study a number of alternative designs for on-line questionnaire presentation were investigated. A long heterogeneous survey was partitioned in four ways: whole/form-based, semantic/section-based, screen/page-based, and single item-based. Questionnaires were presented with or without an index which resulted in eight versions. Times for initial completion of the questionnaire were recorded as well as subjective assessments. Neither initial completion times nor subjective assessments differed among the eight versions due to the highly linear navigation of the survey structures. Respondents were also asked to revisit 16 questions based on only the topic of the question or on the topic and the question number and to change their answers. Revision times reflected ease of finding items in the structure of the survey and the use of an index to the sections of the questionnaire. University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
DataCutter and A Client Interface for the Storage Resource Broker with. Tahsin Kurc. Michael Beynon. Alan Sussman. Joel Saltz. May 2000.
The continuing increase in the capabilities of high performance computers and continued decreases in the cost of secondary and tertiary storage systems is making it increasingly feasible to generate and archive very large (e.g. petabyte and larger) datasets. Applications are also increasingly likely to make use of archived data obtained by different types of sensors. Such sensors include imaging devices deployed on satellites and aircraft, microscopy related imagery and radiology related imagery. Simulation or sensor datasets generated or acquired by one group may need to be accessed over a wide-area network by other groups. Datasets frequently describe data associated with collections of very large structured or unstructured grids where each grid point is associated with several variables. Applications frequently need only to obtain portions of a dataset. Required data may correspond to a particular region in a multidimensional space. The application may need to access all data associated in a multidimensional region or it may need only certain variable values at a subsampled set of spatial locations. In addition, in some cases, applications may require data products obtained by aggregating data in one way or another. For instance, a user might require time or space averaged data. This document describes the design of a middleware infrastructure, called DataCutter, that enables subsetting and user-defined filtering of multi-dimensional datasets stored in archival storage systems across a wide-area network. We also describe a client API for Storage Resource Broker (SRB) clients, which allows SRB clients to carry out subsetting and filtering of datasets stored through the SRB. This API uses a prototype implementation of the DataCutter indexing and filtering services. (Also cross-referenced as UMIACS-TR-2000-26) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland,
The periodic polytope and its applications to a scheduling problem - A. K. Subramani. A. Agrawala. May 2000.
Parameter variability and the existence of complex constraints between tasks are assured features of real-time scheduling. {\em Periodicity} of task sets is an additional feature that needs to be accomodated. Traditional scheduling models ignore the complexities involved in real-time scheduling by making simplistic assumptions about task interactions. In this paper, we present a model that captures the issues that we deem central to real-time scheduling in periodic task sets and demonstrate the existence of efficient and easily implementable algorithms for addressing schedulability queries in this model. Our model is very general and applicable to diverse areas ranging from real-time process scheduling in operating systems and avionics to manufacturing and traffic control. (Also cross-referenced as UMIACS-TR-2000-25) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland\,
Extending User Understanding of Federal Statistics in Tables. Gary Marchionini. Carol Hert. Liz Liddy. Ben Shneiderman. May 2000.
This paper describes progress toward improving user interfaces for US Federal government statistics that are presented in tables. Based on studies of user behaviors and needs related to statistical tables, we describe interfaces to assist diverse users with a range of statistical literacy to explore, find, understand, and use US Federal government statistics. (HCIL-TR-2000-08) (Also cross-referenced UMIACS-TR-2000-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Direct Annotation: A Drag-and-Drop Strategy for Labeling Photos. B. Shneiderman. H. Kang. April 2000.
Annotating photos is such a time-consuming, tedious and error-prone data entry task that it discourages most owners of personal photo libraries. By allowing users to drag labels such as personal names from a scrolling list and drop them on a photo, we believe we can make the task faster, easier and more appealing. Since the names are entered in a database, searching for all photos of a friend or family member is dramatically simplified. We describe the user interface design and the database schema to support direct annotation, as implemented in our PhotoFinder prototype. (HCIL-2000-06) (Also cross-referenced as UMIACS-TR-2000-23) University of Maryland Institute for Advamced Computer Stdies, Human-Computer Interaction Laboratory, University of Maryland, Department of Computer Science, University of Maryland,
Snap-Together Visualization: A User Interface for Coordinating. C. North. B. Shneiderman. April 2000.
Multiple coordinated visualizations enable users to rapidly explore complex information. However, users often need unforeseen combinations of coordinated visualizations that are appropriate for their data. Snap-Together Visualization enables data users to rapidly and dynamically mix and match visualizations and coordinations to construct custom exploration interfaces without programming. Snap's conceptual model is based on the relational database model. Users load relations into visualizations then coordinate them based on the relational joins between them. Users can create different types of coordinations such as: brushing, drill down, overview and detail view, and synchronized scrolling. Visualization developers can make their independent visualizations snap-able with a simple API. Evaluation of Snap revealed benefits, cognitive issues, and usability concerns. Data savvy users were very capable and thrilled to rapidly construct powerful coordinated visualizations. A snapped overview and detail-view coordination improved user performance by 30-80%, depending on task. (Also cross-referenced as UMIACS-TR-2000-22) University of Maryland Institute for Advanced Computer Studies, Human-Computer Interaction Laboratory, University of Maryland, Department of Computer Science, University of Maryland,
An Arnoldi--Schur Algorithm for Large Eigenproblems. G. W. Stewart. April 2000.
Sorensen's iteratively restarted Arnoldi algorithm is one of the most successful and flexible methods for finding a few eigenpairs of a large matrix. However, the need to preserve structure of the Arnoldi decomposition, on which the algorithm is based, restricts the range of transformations that can be performed on it. In consequence, it is difficult to deflate converged Ritz vectors from the decomposition. Moreover, the potential forward instability of the implicit QR algorithm can cause unwanted Ritz vectors to persist in the computation. In this paper we introduce a generalized Arnoldi decomposition that solves both problems in a natural and efficient manner. (Also cross-referenced as UMIACS-TR-2000-21) University of Maryland Institute for Advanced Computer Studies), Department of Computer Science, University of Maryland,
Buffer Merging --- A Powerful Technique for Reducing Memory. P. K. Murthy. S. S. Bhattacharyya. April 2000.
In this paper, we develop a new technique called buffer merging for reducing memory requirements of synchronous dataflow (SDF) specifications. SDF has proven to be an attractive model for specifying DSP systems, and is used in many commercial tools like DSPCanvas, SPW, and COSSAP. Good synthesis from an SDF specification depends crucially on scheduling, and memory is an important metric for generating efficient schedules. Previous techniques on memory minimization have either not considered buffer sharing at all, or have done so at a fairly coarse level (the meaning of this will be made more precise in the paper). In this paper, we develop a buffer overlaying strategy that works at the level of an input/output edge pair of an actor. It works by algebraically encapsulating the lifetimes of the tokens on the input/output edge pair, and determines the maximum amount of the input buffer space that can be reused by the output. We develop the mathematical basis for performing merging operations, and develop several algorithms and heuristics for using the merging technique for generating efficient implementations. We show improvements of up to 54% over previous techniques. (Also cross-referenced as UMIACS-TR-2000-20) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Support for Speculative Update Propagation and Mobility in Deno. Ugur Cetintemel. Peter J. Keleher. Michael Franklin. November 1999.
This paper presents the transactional framework of Deno, an object replication system specifically designed for use in mobile and weakly-connected environments. Deno uses weighted voting for availability and pair-wise, epidemic information flow for flexibility. This combination allows the protocols to operate with less than full connectivity, to easily adapt to changes in group member-ship, and to make few assumptions about the underlying network topology. These features are all crucial to providing effective support for mobile and weakly-connected platforms. Deno has been implemented and runs on top of Linux and Windows NT/CE platforms. We use the Deno prototype to characterize the performance of two versions of Deno's protocol. The first ver-sion enables globally serializable execution of update transactions. The second supports a weaker consistency level that still guarantees transactionally consistent access to replicated data. The re-sults show that our protocols either outperform or perform comparably to existing approaches, while achieving higher availability. Further, we show that the incremental cost of providing global serializability in this environment is low. Finally, we show that commit delays can be sig-nificantly decreased by allowing votes to be cast, and votes and updates to be disseminated, speculatively. (Also cross-referenced as UMIACS-TR-99-70) UNiversity of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Large-Scale Construction of a Chinese-English Semantic Hierarchy. Bonnie J. Dorr. Gina-Anne Levow. Dekang Lin. June 2000.
This paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a semantic hierarchy for Chinese verbs, using thematic-role information to create links between Chinese concepts and English classes. We then present an approach to compensating for gaps in the existing resources. The resulting hierarchy is used for a multilingual lexicon for Chinese-English machine translation and cross-language information retrieval applications. (Also cross-referenced as UMIACS-TR-2000-17) (Also cross-referemced as LAMP-TR-040) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Parametric Polytope and its applications to a Scheduling Problem. K. Subrmani. A. Agrawala. March 2000.
An important feature in Real-time systems is {\em parameter impreciseness} i.e. the inability to accurately determine certain parameter values. The most common such parameter is {\em task execution time}. A second feature is the presence of complex relationships between tasks that constrain their execution. Traditional models do not accomodate either feature completely: (a) Variable execution times are modeled through a fixed value ( {\em worst-case} ), and (b) Relationships are limited to those that can be represented by precedence graphs. We present a task model that effectively captures {\em variable task execution time}, while simultaneously permitting arbitrary linear relationships between tasks. Our model finds applications in diverse areas such as real-time task scheduling, compiler scheduling, real-time database scheduling and machine control. This paper focuses primarily on the computational complexity of answering queries posed in our model; in particular we demonstrate the existence of constraint classes that make the scheduling problem {\em hard.} (Also cross-referenced as UMIACS-TR-2000-16) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Characterisation of Oscillations in the Discrete Two-Dimensional. Howard C. Elman. Alison Ramage. March 2000.
It is well known that discrete solutions to the convection-diffusion equation contain nonphysical oscillations when boundary layers are present but not resolved by the discretisation. However, except for one-dimensional problems, there is little analysis of this phenomenon. In this paper, we present an analysis of the two-dimensional problem with constant flow aligned with the grid, based on a Fourier decomposition of the discrete solution. For Galerkin bilinear finite element discretisations, we derive closed form expressions for the Fourier coefficients, showing them to be weighted sums of certain functions which are oscillatory when the mesh P\'{e}clet number is large. The oscillatory functions are determined as solutions to a set of three-term recurrences, are then used to characterise the oscillations of the discrete solution in terms of the mesh P\'{e}clet number and boundary conditions of the problem. (Also cross-referenced UMIACS-TR-2000-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Svience, University of Maryland,
The Static Polytope and its applications to a scheduling problem. K. Subramani. A. Agrawala. March 2000.
In the design of real-time systems, it is often the case that certain process parameters ( such as {\em execution time} ) are not known precisely. The challenge in real-time system design is to develop techniques that efficiently meet the requirements of impreciseness. Traditional models tend to simplify the issue of impreciseness by assuming {\em worst-case} times. This assumption is unrealistic and at the same time, may cause certain constraints to be violated at run-time. In this paper, we shall study the problem of scheduling a set of ordered, non-preemptive processes under non-constant execution times. Typical applications for variable execution time scheduling include process scheduling in Real-time Operating Systems such as Maruti, compiler scheduling, database transaction scheduling and automated machine control. An important feature of application areas such as robotics is the interaction between execution times of various processes. We explicitly model this interaction through the representation of execution time vectors as points in convex sets. We present both sequential and parallel algorithms for determining the existence of a static schedule. (Also cross-referenced as UMIACS-TR-2000-14) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Dual intepretation of Standard Constraints in Parametric Scheduling. K. Subramani. A. Agrawala. March 2000.
The problem of parametric scheduling in hard real-time systems, ( in the presence of linear relative constraints between the start and execution times of tasks ) was posed in the litreature. In an earlier paper, a polynomial time algorithm is presented for the case when the constraints are restricted to be standard ( defined in paper ) and the execution time vectors belong to an axis-parallel hyper-rectangle. In this paper, we extend their results in two directions. We first present a polynomial time algorithm for the case when the execution time vectors belong to arbitrary convex domains. We then show that the set of standard constraints can be extended to include arbitrary network constraints. Our insights into the problem occur primarily as a result of studying the dual polytope of the constraint system. (Also cross-refernced as UMIACS-TR-2000-11) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Bolshoi - A Modeling Spreadsheet (Improving Usability of Complex. William C. Cheng. Leana Golubchik. March 2000.
Spreadsheet programs are very popular financial modeling tools because they allow users to juggle numbers and formulas with a powerful yet intuitive and easy to understand user interface; also, they often are equipped with sophisticated numerical analysis packages for data analysis and powerful presentation utilities for visualizing results. Computer systems performance and reliability modeling tools of today, on the other hand, have un-intuitive user interfaces and are difficult to learn and use. In this work, we propose to design, build, and evaluate Bolshoi, a modeling spreadsheet, with the goal of putting modeling tools comfortably in the hands of non-expert users. In this proposal, we address management of complexity that exists in performance and reliability analysis of real computer and communication systems. Specifically, we propose to do so through the design and development of an advanced modeling tool. Our tool will provide two important functions: (1) a proper interface for building models that will allow system designers not just to define their models, but visualize them in various ways and (2) easy plug-in of existing and future advanced solution techniques. We call this tool Bolshoi, a Modeling Spreadsheet, because it has a spreadsheet-type interface as detailed below. Performance evaluation of real systems is complex, suffers from scalability problems (or the so-called ``state explosion'' problem) and in many cases requires advanced computational techniques. Often, advanced computational techniques are based on exploitation of ``special structure'' in the models (the primary way to deal with state explosion besides getting a bigger machine). With large and complex models, these special structures are very expensive to expose automatically as it involves searching through a combinatorial number of permutations. Proper visualization of models can greatly assist in the discovery of these special structures so that state space reduction techniques can be applied. Discovery of special structure regularly contributes to many orders of magnitude in computational efficiency. Furthermore, models are often defined over infinite state spaces. We believe that a spreadsheet paradigm is ideal for visualizing such models. Without proper modeling tools, much effort and money is wasted by the computer industry, and moreover, the probability of a successful outcome is low. Thus, a good tool is crucial to advances in the state of the art in performance modeling as well as to successful design of systems in the industry. Every system designer should be able to integrate the use of a performance modeling tool into his/her design process. He/she should be able to easily ask ``what-if'' type questions, explore possible design choices, and make decisions based on quantitative results rather than ``gut feeling''. We believe that a modeling spreadsheet is the right abstraction for such tasks, and furthermore, to the best of our knowledge this abstraction has not been exploited for performance evaluation tool purposes. We believe that the approach proposed here will have a significant impact on future performance tool designs as well as make significant strides in wide-spread use of performance evaluation techniques among computer and communication system designers. Furthermore, a modeling tool that does not require expert-level methodology knowledge is also an excellent undergraduate-level and graduate-level educational tool. Opportunities for hands-on experience with modeling and performance evaluation as well as the ability to add new techniques to the tool greatly improve the educational experience of students and their future ability to apply what they have learned in class to design of real computer and communication systems. (Also cross-referenced as UMIACS-TR-2000-10) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Contention-conscious transaction ordering in embedded multiprocessors. Mukul Khandelia. Shuvra S. Bhattacharyya. March 2000.
This paper explores the problem of efficiently ordering interprocessor communication operations in statically-scheduled multiprocessors for iterative dataflow graphs. In most digital signal processing applications, the throughput of the system is significantly affected by communication costs. By explicitly modeling these costs within an effective graph-theoretic analysis framework, we show that ordered transaction schedules can significantly outperform self-timed schedules even when synchronization costs are low. However, we also show that when communication latencies are non-negligible, finding an optimal transaction order given a static schedule is an NP-complete problem, and that this intractability holds both under iterative and non-iterative execution. We develop new heuristics for finding efficient transaction orders, and perform an experimental comparison to gauge the performance of these heuristics. (Also cross-referenced as UMIACS-TR-2000-09) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Information Dynamics: An Information-Centric Approach to System Design". Ashok K. Agrawala. Ronald L. Larsen. Douglas Szajda. January 2000.
Acquisition, distribution, management, and analysis of information are the fundamental purposes behind most complex constructed systems and infrastructures, and yet a process centric approach is fundamental to the design and implementation of such systems. Since information is the essential commodity in these endeavors, we believe that an effective design should take into account the fundamental properties of information: it's characteristics, its fusion, its distillation, etc. Information Dynamics is an attempt to bring a degree of rigor to the understanding of the nature of information itself and how it is used in pursuit of system objectives. (Also cross-referenced as UMIACS-2000-08) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Designing StoryRooms: Interactive Storytelling Spaces for Children. Houman Alborzi. Allison Druin. Jaime Montemayor. Lisa Sherman. Gustav Taxn. Jack Best. Joe Hammer. Alex Kruskal. Abby Lal. Thomas Plaisant Schwenn. Lauren Sumida. Rebecca Wagner. Jim Hendler. February 2000.
Limited access to space, costly props, and complicated authoring technologies are among the many reasons why children can rarely enjoy the experience of authoring room-sized interactive stories. Typically in these kinds of environments, children are restricted to being story participants, rather than story authors. Therefore, we have begun the development of "StoryRooms," room-sized immersive storytelling experiences for children. With the use of low-tech and high-tech storytelling elements, children can author physical storytelling experiences to share with other children. In the paper that follows, we will describe our design philosophy, design process with children, the current technology implementation and example StoryRooms. (Also cross-referenced as UMIACS-TR-2000-06) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory,
MOCHA: A Self-Extensible Database Middleware System for Distributed. Manuel Rodriguez-Martinez. Nick Roussopoulos. January 2000.
This paper describes MOCHA, a new self-extensible database middleware system designed to interconnect data sources distributed over a computer network. MOCHA is designed to scale to large environments and is based on the idea that some of the user-defined functionality in the system should be deployed by the middleware itself. This is realized by shipping Java code implementing either advanced data types or tailored query operators to remote data sources and have it executed remotely. Optimized query plans push the evaluation of powerful data-reducing operators to the data source sites while executing data-inflating operators near the client's site. The Volume Reduction Factor is a new and more explicit metric introduced in this paper to select the best site to execute query operators and is shown to be more accurate than the standard selectivity factor alone. MOCHA has been implemented in Java and runs on top of Informix and Oracle. We present the architecture of MOCHA, the ideas behind it, and a performance study using data and queries from the Sequoia 2000 Benchmark. The results of this study demonstrate that MOCHA not only provides a flexible and scalable framework for distributed query processing but also substantially improves query performance in contrast to existing middleware solutions. (Also cross-referenced as UMIACS-TR-2000-05) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Design of a Framework for Data-Intensive Wide-Area Applications. Michael D. Beynon. Tahsin Kurc. Alan Sussman. Joel Saltz. February 2000.
Applications that use collections of very large, distributed datasets have become an increasingly important part of science and engineering. With high performance wide-area networks becoming more pervasive, there is interest in making collective use of distributed computational and data resources. Recent work has converged to the notion of the Grid, which attempts to uniformly present a heterogeneous collection of distributed resources. Current Grid research covers many areas from low level infrastructure issues to high level application concerns. However, providing support for efficient exploration and processing of very large scientific datasets stored in distributed archival storage systems remains a challenging research issue. We have initiated an effort that focuses on developing efficient data-intensive applications in a Grid environment. In this paper, we present a framework, called filter-stream programming, that represents the processing units of a data-intensive application as a set of filters, which are designed to be efficient in their use of memory and scratch space. We describe a prototype infrastructure that supports execution of applications using the proposed framework. We present the implementation of two applications using the filter-stream programming framework, and discuss experimental results demonstrating the effects of heterogeneous resources on application performance. (Also cross-referenced as UMIACS-TR-2000-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Clustering Scheme for Hierarchical Routing in Wireless Networks. Suman Banerjee. Samir Khuller. February 2000.
In this paper we present a clustering scheme to create hierarchies for wireless networks. A cluster is defined as a subset of vertices, whose induced graph is connected. In addition, a cluster is required to obey certain constraints that are useful for hierarchical routing. While all these constraints cannot be met simultaneously for general graphs, we show how for wireless network topologies, such a clustering can be obtained. We also present simulation results from a distributed implementation of this scheme to demonstrate its convergence and stability properties. Department of Computer Science, University of Maryland,
Optimizing Retrieval and Processing of Multi-dimensional Scientific. Chialin Chang. Tahsin Kurc. Alan Sussman. Joel Saltz. February 2000.
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular $d$-dimensional array. We present an experimental evaluation of these models for various synthetic datasets and for several driving applications on a 128-node IBM SP. (Also cross-referenced as UMIACS-TR-2000-03) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
IMPACTing SHOP: Foundations for integrating HTN Planning and. Hector Munoz-Avila. Juergen Dix. Dana S. Nau. Yue Cao. February 2000.
In this paper we describe a formalism for integrating the SHOP HTN planning system with the IMPACT multi-agent environment. Our formalism provides an agentized adaptation of the SHOP planning algorithm that takes advantage of IMPACT's capabilities for interacting with external agents, performing mixed symbolic/numeric computations, and making queries to distributed, heterogeneous information sources (such as arbitrary legacy and/or specialized data structures or external databases). We show that this agentized version of SHOP will preserve soundness and completeness if certain conditions are met. (This technical report is the updated version of CS-TR-4085) (Also cross-referenced as UMIACS-TR-2000-02) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Eigensystems of Graded Matrices. G. W. Stewart. January 2000.
Informally a graded matrix is one whose elements show a systematic decrease or increase as one passes across the matrix. It is well known that graded matrices often have small eigenvalues that are determined to high relative accuracy. Similarly, the eigenvectors can have small components that are nonetheless well determined. In this paper, we give approximations to the eigenvalues and eigenvectors of a graded matrix in terms of a base matrix that show how these phenomena come about. This approach provides condition numbers for eigenvalues and individual components of the eigenvectors. The results are applied to derive related results for the singular value decomposition. (Also cross-referenced as UMAICS-TR-2000-01) University of Maryland Institute for Advanced Computer Studies, Department of Computer Sciece, University of Maryland,
A Generalization of Saad's Theorem on Rayleigh-Ritz. G. W. Stewart. December 1999.
Let $(\lambda,x)$ be an eigenpair of the Hermitian matrix $A$ of order $n$ and let $(\mu,u)$ be a Ritz pair from a subspace $\clk$ of $\comp^{2}$. Saad has given a simple inequality bounding $\sin\angle(x,u)$ in terms of $\sin\angle(x,\clk)$. In this note we show that this inequality can be extended to an equally simple inequality for eigenspaces of non-Hermitian matrices. (Also cross-referenced as UMIACS-TR-99-78) University of Maryland, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Probabilistic Object Bases. Thomas Eiter. James Lu. Thomas Lukasiewicz. V.S. Subrahmanian. November 1999.
There are many applications where an object oriented data model is a good way of representing and querying data. However, current object database systems are unable to handle the case of objects whose attributes are uncertain. In this paper, extending previous pioneering work by Kornatzky and Shimony, we develop an extension of the relational algebra to the case of object bases with uncertainty. We propose concepts of consistency for such object bases, together with an NP-completeness result, and classes of probabilistic object bases for which consistency is polynomially checkable. In addition, as certain operations involve conjunctions and disjunctions of events, and as the probability of conjunctive and disjunctive events depends both on the probabilities of the primitive events involved as well as on what is known (if anything) about the relationship between the events, we show how all our algebraic operations may be performed under arbitrary probabilistic conjunction and disjunction strategies. We also develop a host of equivalence results in our algebra, which may be used as rewrite rules for query optimization. Last but not least, we have developed a prototype probabilistic object base server using the VisiBroker ORB on top of ObjectStore. We describe experiments to assess the efficiency of different possible rewrite rules. (Also cross-referenced as UMIACS-TR-99-77) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Designing Storytelling Technologies to Encourage Collabortion Between. Steve Benford. Benjamin B. Bederson. Karl-Petter Åkesson. Victor Bayon. Allison Druin. Pär Hansson. Juan Pablo Hourcade. Rob Ingram. Helen Neale. Claire O’Malle. Kristian T. Simsarian. Danaë Stanton. Yngve Sundblad. Gustav Taxén. November 1999.
We describe the iterative design of two collaborative storytelling technologies for young children, KidPad and the Klump. We focus on the idea of designing interfaces to subtly encourage collaboration so that children are invited to discover the added benefits of working together. This idea has been motivated by our experiences of using early versions of our technologies in schools in Sweden and the UK. We compare the approach of encouraging collaboration with other approaches to synchronizing shared interfaces. We describe how we have revised the technologies to encourage collaboration and to reflect design suggestions made by the children themselves. (Also cross-referenced as UMIACS-TR-99-76) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Single Display Groupware. Benjamin B. Bederson. Jason Stewart. Allison Druin. November 1999.
We discuss a model for supporting collaborative work between people that are physically close to each other. We call this model Single Display Groupware (SDG). In this paper, we describe the model, comparing it to more traditional remote collaboration. We describe the requirements that SDG places on computer technology, and our understanding of the benefits and costs of SDG systems. Finally, we describe a prototype SDG system that we built and the results of a usability test we ran with 60 elementary school children. Through participant observation, video analysis, program instrumentation, and an informal survey, we discovered that the SDG approach to collaboration has strong potential. Children overwhelmingly prefer two mice to one mouse when collaborating with other children. We identified several collaborative styles including a dominant partner, independent simultaneous use, a mentor/mentee relationship, and active collaboration. (Also cross-referenced as UMIACS-TR-99-75) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
IMPACTing SHOP: Foundations for integrating HTN Planning and. Hector Munoz-Avila. Juergen Dix. Dana S. Nau. Yue Cao. November 1999.
AI planning systems typically require that the state of the world be locally accessible. We call this the centralized state requirement. Furthermore, the state is described in a special representation language, mostly related to first-order logic. We refer to this as the uniform representation requirement. Relevant data from other sources must therefore be translated by hand into this language, stored in main memory and cannot be accessed automatically or as needed. These requirements, however, do not hold in many real-world domains. Information about the state may be distributed in several locations, each of which may have its own representation language. We address this problem by using a recently developed architecture for a Multi-Agent System, IMPACT, and its code-call mechanism. Within IMPACT queries and requests to arbitrary legacy and/or specialized data structures or external databases may be executed. We show in this paper how to combine the basic algorithm of a very efficient HTN planner, SHOP, with the code-call mechanism of IMPACT. This opens the way for SHOP to access real-world data and to base the planning process on external databases. We show that SHOP is sound and complete w.r.t. this extended data access. This technical report has been updated and revised and is available full-text/online as CS-TR-4100. (Also cross-referenced as UMIACS-TR-99-74) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Parameterized Modeling and Scheduling of Dataflow Graphs. Bishnupriya Bhattacharya. Shuvra S. Bhattacharyya. November 1999.
Dataflow has proven to be an attractive computational model for programming DSP applications. A restricted version of dataflow, called Synchronous Dataflow (SDF) is particularly well-suited for modeling a large class of signal processing applications, as it offers strong formal properties and compile-time predictability. However, the SDF model does not allow data-dependent flow of control or dynamically varying communication patterns between functional modules. This results in limited expressive power. Consequently, a variety of extensions to SDF have been developed, where the objective is to provide increased expressive power, while maintaining a significant part of the compile-time predictability of SDF. In this report, we propose a parameterized dataflow framework that can be applied as a meta-modeling technique to an arbitrary dataflow model that satisfies certain requirements, to further increase its expressive power. For clarity, we focus on synchronous dataflow, and develop the precise semantics of parameterized synchronous dataflow (PSDF). We propose a formal framework for the PSDF model, and introduce the concept of local synchrony, which is a condition that must be satisfied for consistent execution of PSDF specifications. From our experience, it appears that the PSDF model significantly increases the expressive power of pure SDF, while maintaining many of the desirable properties of SDF, like low-overhead scheduling (geared towards software synthesis in embedded systems). We develop techniques for implementing the operational semantics of PSDF that allows efficient quasi-static scheduling of a class of PSDF specifications. University of Maryland Institute for Advanced Computer Studies, Department of Electrical Engineering, University of Maryland, Department of Coomputer Science, University of Maryland,
A Convex Optimization Approach for Addressing Storage-Communication. Radha Poovendran. November 1999.
In Eurocrypt'99, Canetti, Malkin, and Nissim [1], presented a new tree based key distribution algorithm that required sublinear storage of keys while preserving logarithmic update communication as functions of the group size. The results in are known to be the first results presenting the sub-linear storage among the family of tree based key distribution schemes. The question of whether this storage was the possible optimal value while keeping the communication as logarithmic was posed as a problem. We show that the storage-communication tradeoff can be formulated as a convex optimization problem in terms of the size of the minimal storage parameter defined in. In particular, we show that the optimal solution is parameterizable by the ratio of the communication and storage costs, the degree of the tree, and the group size. Using this design triplet, we show that not only the results in [1] but also the results of the basic scheme of Wallner, Harder, and Agee [2] can be derived as specific Pareto optimal points for specific choice of the triplet. We also present an exact design procedure for feasibility testing and constructing optimal key distribution tree of the type in. We also show that if the communication and the storage are equally weighted, then the optimal value for storage and communication grows as square root of group size , a value noted in [1]. Department of Computer Science, University of Maryland,
Scheduling Jobs Before Shut Down. Vincenzo Liberatore. December 1999.
Distributed systems execute background or alternative jobs while waiting for data or requests to arrive from another processor. In those cases, the following shut-down scheduling problem arises: given a set of jobs of known processing time, schedule them on m machines so as to maximize the total weight of jobs completed before an initially unknown deadline. We will present optimally competitive deterministic and randomized algorithms for shut-down scheduling. Our deterministic algorithm is parameterized by the number of machines m. Its competitive ratio increases as the number of machines decreases, but it is optimal for any given choice of m. Such family of deterministic algorithm can be translated into a family of randomized algorithms that use progressively less randomization and that are optimal for the given amount of randomization. Hence, we establish a precise trade-off between amount of randomization and competitive ratios. Distributed systems execute background or alternative jobs while waiting for data or requests to arrive from another processor. In those cases, the following shut-down scheduling problem arises: given a set of jobs of known processing time, schedule them on m machines so as to maximize the total weight of jobs completed before an initially unknown deadline. We will present optimally competitive deterministic and randomized algorithms for shut-down scheduling. Our deterministic algorithm is parameterized by the number of machines m. Its competitive ratio increases as the number of machines decreases, but it is optimal for any given choice of m. Such family of deterministic algorithm can be translated into a family of randomized algorithms that use progressively less randomization and that are optimal for the given amount of randomization. Hence, we establish a precise trade-off between amount of randomization and competitive ratios. (Also cross-referenced as UMIACS-TR-99-72) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
SHOE: A Knowledge Representation Language for Internet Applications. Jeff Heflin. James Hendler. Sean Luke. October 1999.
It is our contention that the World Wide Web poses challenges to knowledge representation systems that fundamentally change the way we should design KR languages. In this paper, we describe the Simple HTML Ontology Extensions (SHOE), a KR language which allows web pages to be annotated with semantics. We present a formalism for the language and discuss the features which make it well suited for the Web. We describe the syntax and semantics of this language, and discuss the differences from traditional KR systems that make it more suited to modern web applications. We also describe some generic tools for using the language and demonstrate its capabilities by describing two prototype systems that use it. We also discuss some future tools currently being developed for the language. The language, tools, and details of the applications are all available on the World Wide Web at http://www.cs.umd.edu/projects/plus/SHOE. (Also cross-referenced as UMIACS-TR-99-71) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Security Infrastructure for Mobile Transactional Systems. Peter J. Keleher. Bobby Bhattacharjee. Kuo-Tung Kuo. Ugur Cetintemel. April 2000.
In this paper, we present an infrastructure for providing secure transactional support for mobile databases. Our infrastructure protects against external threats - malicious actions by nodes not authorized to access the data. The major contribution of this paper, however, is to classify and present algorithms to protect against internal security threats. Internal threats are malicious ac-tions by authenticated nodes that misrepresent protocol specific information. We quantify the cost of our security mechanisms in context of Deno: a system that supports object replication in a transactional framework for mobile and weakly-connected environments. Our results show that protecting against internal threats comes at a cost, but the marginal cost for protecting against larger cliques of malicious insiders is low. However, even with all the security mechanisms in place, our system commits updates over 50% faster than systems that depend on the Read-once Write-all commit protocol. Lastly, we present results from a probabilistic version of our algorithm that has several orders of magnitude lower computation cost than the traditional public-key based schemes. (Also cross-referenced as UMIACS-TR-2000-19) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
ViPEr-HiSS: A Case for Storage Design Tools. Leana Golubchik. Joseph Dunnick. Jeffrey K. Hollingsworth. October 1999.
The viability of large-scale multimedia applications, depends on the performance of storage systems. Providing cost-effective access to vast amounts of video, image, audio, and text data, requires (a) proper configuration of storage hierarchies as well as (b) efficient resource management techniques at all levels of the storage hierarchy. The resulting complexities of the hardware/software co-design in turn contribute to difficulties in making accurate predictions about performance, scalability, and cost-effectiveness of a storage system. Moreover, poor decisions at design time can be costly and problematic to correct in later stages of development. Hence, measurement of systems after they have been developed is not a desirable approach to predicting their performance. What is needed is the ability to evaluate the system's design while there are still opportunities to make corrections to fundamental design flaws. In this paper we describe the framework of ViPEr-HiSS, a tool which facilitates design, development, and subsequent performance evaluation of designs of multimedia storage hierarchies by providing mechanisms for relatively easy experimentation with (a) system configurations as well as (b) application- and media-aware resource management techniques. (Also cross-referenced as UMIACS-TR-99-69) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science , University of Maryland,
PETS: A Personal Teller of Stories. Jaime Montemayor. Allison Druin. Jim Hendler. November 1999.
Let us start by reading a story written by a seven year old child, entitled Michelle. "There once was a robot named Michelle. She was new in the neighborhood. She was HAPPY when she first came, thinking she would make friends. But it was the opposite. Other robots threw rocks and sticks. She was SAD. Now no one liked her. One day she was walking down a street, a huge busy one, when another robot named Rob came up and ask [sic] if she wanted to have a friend. She was SCARED at first but then realized that she was HAPPY. The other robots were ANGRY but knew that they had learned their lesson. Michelle and Rob lived HAPPILY ever after. No one noticed the dents from rocks that stayed on Michelle." (Druin, Research notes, August 1998) This is just one of many stories that children have written with the help of PETS (Druin et al. 1999a). The author of Michelle did not just write this moving story; she is also an integral member of the team that built our robots. As you read on, PETS will be further described. Our motivations behind building such an interactive robotic pet will also be discussed. In addition, the process of how we made this robotic technology with our team of adults and six children will be introduced. And with this, we will present cooperative inquiry (Druin 1999a), the methodology that we embrace as we discover insights about technology, education, science, engineering, and art. Finally, this chapter will close with reflections on what was learned from on-going research effort. (Also cross-referenced as UMIACS-TR-99-67) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Efficient Preconditioning of the Linearized Navier-Stokes Equations}. David Silvester. Howard Elman. David Kay. Andrew Wathen. October 1999.
We outline a new class of robust and efficient methods for solving subproblems that arise in the linearization and operator splitting of Navier-Stokes equations. We describe a very general strategy for preconditioning that has two basic building blocks; a multigrid V-cycle for the scalar convection-diffusion operator, and a multigrid V-cycle for a pressure Poisson operator. We present numerical experiments illustrating that a simple implementation of our approach leads to an effective and robust solver strategy in that the convergence rate is independent of the grid and the time-step, and only deteriorates very slowly as the Reynolds number is increased. (Also cross-referenced as UMIACS-TR-99-66) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Active Logics: A Unified Formal Approach to Episodic Reasoning. Jennifer Elgot-Drapkin. Sarit Kraus. Michael Miller. Madhura Nirkhe. Donald Perlis. October 1999.
Artificial intelligence research falls roughly into two categories: formal and implementational. This division is not completely firm: there are implementational studies based on (formal or informal) theories (e.g., CYC, SOAR, OSCAR), and there are theories framed with an eye toward implementability (e.g., predicate circumscription). Nevertheless, formal/theoretical work tends to focus on very narrow problems (and even on very special cases of very narrow problems) while trying to get them ``right'' in a very strict sense, while implementational work tends to aim at fairly broad ranges of behavior but often at the expense of any kind of overall conceptually unifying framework that informs understanding. It is sometimes urged that this gap is intrinsic to the topic: intelligence is not a unitary thing for which there will be a unifying theory, but rather a ``society'' of subintelligences whose overall behavior cannot be reduced to useful characterizing and predictive principles. Here we describe a formal architecture that is more closely tied to implementational constraints than is usual for formalisms, and which has been used to solve a number of commonsense problems in a unified manner. In particular, we address the issue of formal, integrated, and longitudinal reasoning: inferentially-modeled behavior that incorporates a fairly wide variety of types of commonsense reasoning within the context of a single extended episode of activity requiring keeping track of ongoing progress, and altering plans and beliefs accordingly. Instead of aiming at optimal solutions to isolated, well-specified and temporally narrow problems, we focus on satisficing solutions to under-specified and temporally-extended problems, much closer to real-world needs. We believe that such a focus is required for AI to arrive at truly intelligent mechanisms with the ability to behave effectively over considerably longer time periods and range of circumstances than is common in AI today. While this will surely lead to less elegant formalisms, it also surely is requisite if AI is to get fully out of the blocks-world and into the real world. (Also cross-referenced as UMIACS-TR-99-65) University of Maryland Institute for Advaced Computer Studies, Department of Computer Science, University of Maryland,
On Orthogonalization in the Inverse Power Method. G. W. Stewart. September 1999.
When the inverse power method is used to compute eigenvectors of a symmetric matrix corresponding to close eigenvalues, the computed eigenvectors may not be orthogonal. The cure for the problem is to orthogonalize the vectors using the Gram--Schmidt algorithm. In this note it is shown that the orthogonalization process does not cause the quality of the eigenvectors to deteriorate. Also cross-referenced as UMIACS-TR-99-64 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Evolving a Set of Techniques for OO Inspections. Forrest Shull. Guilherme H. Travassos. Jeffrey Carver. Victor R. Basili. October 1999.
Inspecting OO designs is an important way of ensuring the quality of software under development. When high-level design activities are finished, the design documents can be inspected to verify whether they are consistent among themselves and whether the software requirements were correctly and completely captured. This paper discusses some issues regarding the definition and application of reading techniques (i.e. procedural guidelines that can be given to inspectors) to inspect high-level OO design documents. An initial set of OO Reading Techniques and their experimental evaluation is described. A method for evaluating the reading techniques in more detail, i.e. Observational Techniques, is then presented, and experiences with its use are discussed. Through these discussions, we show how the reading techniques have evolved in response to empirical evidence (both qualitative and quantitative) regarding their use in practice. The complete and current set of techniques can be found in the appendices. (Also cross-referenced as UMIACS-TR-99-63) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Generating Efficient Stack Code for Java. Tatiana Shpeisman. Mustafa Tikir. October 1999.
Optimizing Java byte code is complicated by the fact that it uses a stack-based execution model. Changing the intermediate representation from the stack-based to the register-based one brings the problem of Java byte code optimizations into well-studied domain of compiler optimizations for register-based codes. In this paper we describe the technique to convert a register-based code into the Java byte code. The code generation techniques developed for the stack-based computers are not directly applicable to this problem as the comparative cost of the local memory and stack manipulation instructions in JVM is quite different from that in the stack-based computers. Naive verbose translation of the register-based code into the Java byte code produces the code with many redundant store and load instructions. The tool that we have developed allows to remove 90-100 \% of the stores to the local (i.e., non-global) variables. It produces the Java byte code that is slightly faster and shorter than the original byte code even when no optimizations except for register allocation are performed on the register-based code. Department of Computer Science, University of Maryland,
Secure Agents. Piero Bonatti. Sarit Kraus. V.S.Subrahmanian. October 1999.
With the rapid proliferation of software agents, there comes an increased need for agents to ensure that they do not provide data and/or services to unauthorized users. We first develop an abstract definition of what it means for an agent to preserve data/action security. Most often, this requires an agent to have knowledge that is impossible to acquire --- hence, we then develop approximate security checks that take into account, the fact that an agent usually has incomplete/approximate beliefs about other agents. We develop two types of security checks --- static ones that can be checked prior to deploying the agent, and dynamic ones that are executed at run time. We prove that a number of these problems are undecidable, but under certain conditions, they are decidable and (our definition of) security can be guaranteed. Finally, we propose a language within which the developer of an agent can specify her security needs, and present provably correct algorithms for static/dynamic security verification. (Also cross-refernced as UMIACS-TR-99-62) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Automatic Deployment of Application-Specific Metadata and Code in MOCHA. Manuel Rodriguez. Nick Roussopoulos. December 1999.
Database middleware systems require the deployment of application-specific data types and query operators to the servers and clients in the system. Existing middleware solutions rely on developers and system administrators to port and manually install all this application-specific functionality to all sites in the system. This approach cannot scale to an environment in which there are hundreds of data sources, such as those accessed by the Web and even more custom-tailored applications, since the complexity and the cost involved in maintaining a code base system-wide are enormous. This paper describes a novel metadata-driven framework designed to automate the deployment of all application-specific functionality used by a middleware system. We used Java and XML to implement this framework in MOCHA, a middleware system developed at the University of Maryland. We first present the kind of services, metadata elements and software tools used in MOCHA to automate code deployment. Then, we describe how the features of MOCHA simplify the administration and reduce the management cost of a middleware system in a large scale environment. (Also cross-refernced as UMIACS-TR-99-61) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Benefits of Simultaneous over Sequential Menus as Task. H. Hochheiser. N. Kositsyna. G. Ville. B. Shneiderman. September 1999.
To date, experimental comparisons of menu layouts have concentrated on variants of hierarchical structures of sequentially presented menus. Simultaneous menus - layouts which present multiple active menus on a screen at the same time - are an alternative arrangement that may be useful in many web design situations. This paper describes an experiment involving a between-subject comparison of simultaneous menu and their traditional sequential counterparts. Twenty experienced web users used either simultaneous or sequential menus in a standard web browser to answer questions based on US Census data. For novice users performing simple tasks the simplicity of sequential menus appears to be helpful, but for most tasks and most users there is good evidence to believe that simultaneous menus speed performance and improve satisfaction. Design improvements can amplify the benefits of simultaneous menu layouts. (Also cross-referenced asUMIACS-TR-99-60) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Negative Cycle Detection in Dynamic Graphs. Nitin Chandrachoodan. Shuvra S.Bhattacharyya. K.J.Ray Liu. September 1999.
We examine the problem of detecting negative cycles in a dynamic graph, which is a fundamental problem that arises in electronic design automation and systems theory. Previous approaches used for this have tried to modify Dijkstra's algorithm since it is the fastest known Single-Source Shortest Path algorithm. We introduce the concept of {\em batch mode} negative cycle detection, in which a graph changes over time, and negative cycle detection needs to be done periodically. Such scenarios arise, for example, during iterative design space exploration for hardware and software synthesis. We present an algorithm for this problem, based on the Bellman-Ford algorithm, which outperforms previous approaches. We also show that this technique leads to very fast algorithms for the computation of the maximum-cycle mean (MCM) of a graph, especially for a certain form of {\em sparse graph}. Such sparseness often occurs in practice, as demonstrated for example by the ISCAS 89/93 benchmarks. We present experimental results that demonstrate the advantages of our batch-processing techniques, and illustrate their application to design-space exploration by developing an automated local-search technique for multiple-voltage scheduling of iterative data-flow graphs. (Also cross-referenced as UMIACS-TR-99-59) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Software synthesis and code generation for signal processing systems. S. S. Bhattacharyya. R. Leupers. P. Marwedel. September 1999.
No abstract submitted (Also cross-referenced as UMIACS-TR-99-57 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The CBP parameter --- a useful annotation to aid SDF compilers. S. S. Bhattacharyya. P. K. Murthy. September 1999.
The role of software is becoming increasingly important in the implementation of DSP applications. As this trend intensifies, and the complexity of applications escalates, we are seeing an increased need for automated tools to aid in the development of DSP software. This paper reviews the state of the art in programming language and compiler technology for DSP software implementation. In particular, we review techniques for high level, block-diagram-based modeling of DSP applications; the translation of block diagram specifications into efficient C programs using global, target-independent optimization techniques; and the compilation of C programs into streamlined machine code for programmable DSP processors, using architecture-specific and retargetable back-end optimizations. In our review, we also point out some important directions for further investigation. (also cross-referenced as UMIACS-TR-99-56) University of Maryland Institute for Advanced Computer Syudies, Department of Computer Science, University of Maryland,
XMT-M: A Scalable Decentralized Processor. Efraim Berkovich. Joseph Nuzman. Manoj Franklin. Bruce Jacob. Uzi Vishkin. September 1999.
A defining challenge for research in computer science and engineering has been the ongoing quest for reducing the completion time of a single computation task. Even outside the parallel processing communities, there is little doubt that the key to further progress in this quest is to do parallel processing of some kind. A recently proposed parallel processing framework that spans the entire spectrum from (parallel) algorithms to architecture to implementation is the explicit multi-threading (XMT) framework. This framework provides: (i) simple and natural parallel algorithms for essentially every general-purpose application, including notoriously difficult irregular integer applications, and (ii) a multi-threaded programming model for these algorithms which allows an ``independence-of-order'' semantics: every thread can proceed at its own speed, independent of other concurrent threads. To the extent possible, the XMT framework uses established ideas in parallel processing. This paper presents XMT-M, a microarchitecture implementation of the XMT model that is possible with current technology. XMT-M offers an engineering design point that addresses four concerns: buildability, programmability, performance, and scalability. The XMT-M hardware is geared to execute multiple threads in parallel on a single chip: relying on very few new gadgets, it can execute parallel threads without busy-waits! Existing code can be run on XMT-M as a single thread without any modifications, thereby providing backward compatibility for commercial acceptance. Simulation-based studies of XMT-M demonstrate considerable improvements in performance relative to the best serial processor even for small, and therefore practical, input sizes. (Also cross-referenced as UMIACS-TR-99-55) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Cost Models for Query Processing Strategies in the Active Data. Chialin Chang. September 1999.
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we describe analytical models to predict the average computation, I/O and communication operation counts of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular d-dimensional array. We validate these models for various synthetic datasets and for several driving applications. Also cross-referenced as UMIACS-TR-99-54 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hashing Technique: A New Index Method for High Dimensional Data. Zhexuan Song. Nick Roussopoulos. September 1999.
When dimension goes high, sequential scan processing becomes more efficient than most index-based query. In this paper, we propose a new index method for high-dimensional data spaces. This method is based on hashing technique. The basic idea is: First find a hashing function which puts the given d-dimensional space data into a d'-dimensional buckets where d' << d. Then, we use existing index techniques to manage those buckets. We later define some properties of a good hashing function and give four hashing functions. To demonstrate the efficiency of our idea, we experimentally compared our algorithms with sequential scan and Pyramid-Techniques. The results demonstrate that this method outperforms others for skewed data set. It always beats the sequential scan by using only half of elapsed time for range query. However if the data has uniform distribution, Pyramid-Technique is still the best method. Department of Computer Science, University of Maryland,
The Role of Children in the Design Technology. Allison Druin. September 1999.
Children play games, chat with friends, tell stories, study history or math, and today this can all be done supported by new technologies. From the Internet to multimedia authoring tools, technology is changing the way children live and learn. As these new technologies become ever more critical to our children's lives, we need to be sure these technologies support children in ways that make sense for them as young learners, explorers, and avid technology users. This may seem of obvious importance, because for almost 20 years the HCI community has pursued new ways to understand users of technology. However, with children as users, it has been difficult to bring them into the design process. Children go to school for most of their days; there are existing power structures, biases, and assumptions between adults and children to get beyond; and children, especially young ones have difficulty in verbalizing their thoughts. For all of these reasons, a child's role in the design of new technology has historically been minimized. Based upon a survey of the literature and my own research experiences with children, this paper defines a framework for understanding the various roles children can have in the design process, and how these roles can impact technologies that are created. (Also cross-referenced as UMIACS-TR-99-53) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Temporal Agent Programs. J. Dix. S. Kraus. V.S. Subrahmanian. September 1999.
The ``agent program'' framework introduced by Eiter, Subrahmanian and Pick (\textbf{Artificial Intelligence, 108(1-2), 1999}), supports developing agents on top of arbitrary legacy code. Such agents are continuously engaged in an \emph{``event occurs, think, act, event occurs''} cycle. However, this framework has two major limitations: (1) all actions are assumed to have no duration, and (2) all actions are taken now, but cannot be \emph{scheduled for the future}. In this paper, we present the concept of a ``temporal agent program'' (\tap for short) and show that using {\tap}s, it is possible to build agents on top of legacy code that can reason about the past and about the future, and that can make temporal commitments for the future now. We develop a formal semantics for such agents, extending the concept of a status set proposed by Eiter et al., and develop algorithms to compute the status sets associated with temporal agent programs. Last, but not least, we show how {\tap}s support classical negotiation methods (as well as some new ones) and classical auction methods (as well as some new ones). (Also cross-referenced as UMIACS-TR-99-51) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Probabilistic Agent Programs. Juergen Dix. Mirco Nanni. VS Subrahmanian. September 1999.
Agents are small programs that autonomously take actions based on changes in their environment or ``state.'' Over the last few years, there have been an increasing number of efforts to build agents that can interact and/or collaborate with other agents. In one of these efforts, Eiter, Subrahmanian amd Pick (AIJ, 108(1-2), pages 179-255) have shown how agents may be built on top of legacy code. However, their framework assumes that agent states are completely determined, and there is no uncertainty in an agent's state. Thus, their framework allows an agent developer to specify how his agents will react when the agent is 100\% sure about what is true/false in the world state. In this paper, we propose the concept of a \emph{probabilistic agent program} and show how, given an arbitrary program written in any imperative language, we may build a declarative ``probabilistic'' agent program on top of it which supports decision making in the presence of uncertainty. We provide two alternative semantics for probabilistic agent programs. We show that the second semantics, though more epistemically appealing, is more complex to compute. We provide sound and complete algorithms to compute the semantics of \emph{positive} agent programs. (Also cross-referenced as UMIACS-TR-99-50) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Meta Agent Programs. Juergen Dix. V.S. Subrahmanian. George Pick. September 1999.
There are numerous applications where an agent \aga needs to reason about the beliefs of another agent, as well as about the actions that other agents may take. In Eiter/Subrahmanian/Pick the concept of an agent program is introduced, and a language within which the operating principles of an agent can be declaratively encoded on top of imperative data structures is defined. In this paper we first introduce certain belief data structures that an agent needs to maintain. Then we introduce the concept of a \emph{Meta Agent Program} (\map), that extends the framework of Eiter/Subrahmanian/Pick, so as to allow agents to perform metareasoning. We build a formal semantics for \map{s}, and show how this semantics supports not just beliefs agent a may have about agent b's state, but also beliefs about agents b's beliefs about agent c's actions, beliefs about b's beliefs about agent c's state, and so on. Finally, we provide a translation that takes any \map as input and converts it into an agent program such that there is a one-one correspondence between the semantics of the \map and the semantics of the resulting agent program. This correspondence allows an implementation of \map{s} to be built on top of an implementation of agent programs. Also cross-referenced as UMIACS-TR-99-49 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Nonmonotonic Reasoning: Towards efficient calculi and implementations. Juergen Dix. Ulrich Furbach. Ilkka Niemelae. September 1999.
In this paper we do not want to give a detailed overview of the various formalizations of nonmonotonic reasoning that have evolved (those can be found in various textbooks), but we want to give an overview of the main computational techniques and methods leading to implementions of nonmonotonic reasoning. We first introduce the main nonmonotonic logics: \emph{Default Logic}, \emph{Circumscription} and \emph{Autoepistemic Logic}. We also consider the abstract approach of Kraus, Lehmann and Magidor to associate with any reasoning system an \emph{abstract consequence relation}. Then we investigate universal methods for computing in general nonmonotonic logics. We do this with a special eye on the underlying complexity and show how this lead to automated theorem proving in such logics. Finding efficient computation mechanisms for the logics introduced in the former section is the aim of the next Section. There we consider techniques that originated from automated reasoning in first-order predicate calculus. We depict how these techniques can be applied for disjunctive logic programming with programs with variables but only limited use of negation. In particular, we handle \ie{GCWA} as a basis for nonmonotonic negation therein. We then give a declarative overview on nonmonotonicity in logic programming. We introduce (nonmonotonic) semantics of logic programs with negation and disjunction, notably the well-founded and the stable semantics and their extensions to programs containing disjunction--- they constitute the most important semantics and are in close relation to the logics introduced in the next Section. While in we considered in a former section techniques that can be successfully applied for programs with variables and only limited use of negation, we also treat propositional programs with full negation and disjunction. In particular, we provide implementations of \mbox{D-WFS}\Index{D-WFS} and \ie{D-ST ABLE} in polynomial space. We end with a section where we consider the problem of finding good benchmarks to test and compare nonmonotonic systems against. Also cross-referenced as UMIACS-TR-99-48 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Explaining Updates by minimal sums. Juergen Dix. Karl Schlechta. September 1999.
Human reasoning about developments of the world involves always an assumption of \emph{inertia}. We discuss two approaches for formalizing such an assumption, based on the concept of an \emph{explanation}: \emph{(1)} there is a general preference relation given on the set of all explanations, \emph{(2)} there is a notion of a \emph{distance} between models and explanations are \emph{preferred} if their sum of distances is minimal. We show exactly under which conditions the converse is true as well and therefore both approaches are equivalent modulo these conditions. Our main result is a general representation theorem in the spirit of Kraus, Lehmann and Magidor. Also cross-referenced as UMIACS-TR-99-47 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A General Theory of Confluent rewriting Systems for Logic Programming. Juergen Dix. Mauricio Osorio. September 1999.
Recently, Brass and Dix showed (\emph{Journal of Automated Reasoning} \textbf{20(1)}, 1998) that the wellfounded semantics WFS can be defined as a confluent calculus of transformation rules. This lead not only to a simple extension to disjunctive programs (\emph{Journal of Logic Programming} \textbf{38(3)}, 1999), but also to a new computation of the wellfounded semantics which is \emph{linear} for a broad class of programs. We take this approach as a starting point and generalize it considerably by developing a general theory of \emph{Confluent LP-Systems} $\cfs$. Such a system $\cfs$ is a rewriting system on the set of all logic programs over a fixed signature $\Lang$ and it induces in a natural way a canonical semantics. Moreover, we show four important applications of this theory: \emph{(1) most of the well-known semantics are induced by confluent LP-systems}, \emph{(2) there are many more transformation rules that lead to confluent LP-systems}, \emph{(3) semantics induced by such systems can be used to model aggregation}, \emph{(4) the new systems can be used to construct interesting counterexamples to some conjectures about the space of well-behaved semantics}. Also cross-referenced as UMIACS-TR-99-46 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Striping Doesn't Scale: How to Achieve Scalability. ChengFu Chou. Leana Golubchik. John C.S. Lui. September 1999.
Multimedia applications place high demands for QoS, performance, and reliability on storage servers and communication networks. These, often stringent, requirements make design of cost-effective and scalable continuous media (CM) servers difficult. In particular, the choice of data placement techniques can have a significant effect on the scalability of the CM server and its ability to utilize resources efficiently. In the recent past, a great deal of work has focused on ``wide'' data striping as a technique which ``implicitly'' solves load balancing problems; although, it does suffer from multiple shortcomings. Another approach to dealing with load imbalance problems is replication. The main focus of this paper is a study of scalability characteristics of CM servers as a function of tradeoffs between striping and replication. More specifically, striping is a good approach to load balancing while replication is a good approach to ``isolating'' nodes from being dependent on other system resources. The appropriate compromise between the degree of striping and the degree of replication is key to the design of a scalable CM server. This is the topic of our work. Also cross-referenced as UMIACS-TR-99-45 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On Fault Location in Networks by Passive Testing. Raymond E. Miller. Khaled A. Arisha. August 1999.
In this paper, we employ a variant of the communicating finite state machine (CFSM) model for networks to investigate fault detection and location using passive testing. First, we introduce the concept of passive testing, then we introduce the model with necessary assumptions and justification. Then, the model for the observer process is described and a 3-node case is studied to show how fault location information can be deduced. Extending this result, we propose a multiple node-cut approach for a general network, applying our technique for fault detection and location. An abstraction of a node-cut shows how the 3-node case can be used in the general case. We then illustrate our technique through a simulation of a practical X.25 example. Finally future extensions and potential trends are discussed. Department of Computer Science, University of Maryland,
Universal Usability: Pushing Human-Computer Interaction Research to. Ben Shneiderman. July 1999.
"I feel... an ardent desire to see knowledge so disseminated through the mass of mankind that it may...reach even the extremes of society: beggars and kings." -- Thomas Jefferson, Reply to American Philosophical Society, 1808 In a fair society, all individuals would have equal opportunity to participate in, or benefit from, the use of computer resources regardless of race, sex, religion, age, disability, national origin or other such similar factors. -- ACM Code of Ethics Position Paper for National Science Foundation & European Commission meeting on human-computer interaction research agenda, June 1-4, 1999, Toulouse, France. To be published in book form. Also cross-referenced as UMIACS-TR-99-17 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Supporting Creativity with Advanced Information-Abundant User. Ben Shneiderman. June 1999.
A challenge for human-computer interaction researchers and user interface designers is to construct information technologies that support creativity. This ambitious goal can be attained if designers build on an adequate understanding of creative processes. This paper describes a model of creativity, the four-phase genex framework for generating excellence: - Collect: learn from previous works stored in digital libraries, the web, etc. - Relate: consult with peers and mentors at early, middle and late stages - Create: explore, compose, discover, and evaluate possible solutions - Donate: disseminate the results and contribute to the digital libraries, the web, etc. Within this integrated framework, there are eight activities that require human-computer interaction research and advanced user interface design. This paper concentrates on techniques of information visualization that support creative work by enabling users to find relevant information resources, identify desired items in a set, or discover patterns in a collection. It describes information visualization methods and proposes five questions for the future: generality, integration, perceptual foundations, cognitive principles, and collaboration. Also cross-referenced as UMIACS-TR-9942 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Improving Locality For Adaptive Irregular Scientific Codes. Hwansoo Han. Chau-Wen Tseng. September 1999.
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in poor performance. Researchers have shown that consecutively packing data relative to traversal order can significantly reduce cache miss rates by increasing spatial locality. In this paper, we investigate techniques for using partitioning algorithms to improve locality in adaptive irregular codes. We develop parameters to guide both geometric (RCB) and graph partitioning (METIS) algorithms, and develop a new graph partitioning algorithm based on hierarchical clustering (GPART) which achieves good locality with low overhead. We also examine the effectiveness of locality optimizations for adaptive codes, where connection patterns dynamically change at intervals during program execution. We use a simple cost model to guide locality optimizations when access patterns change. Experiments on irregular scientific codes for a variety of meshes show our partitioning algorithms are effective for static and adaptive codes on both sequential and parallel machines. Improved locality also enhances the effectiveness of LocalWrite, a parallelization technique for irregular reductions based on the owner computes rule. Also cross-referenced as UMIACS-TR-99-41 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Empirical Studies in Parallel Sorting. Evan Golub. May 1998.
I examine different parallel algorithms for sorting in rounds. Most of these algorithms use a graph to indicate the comparisons to be made. The primary difference between the algorithms is how these graphs are chosen. One uses graphs that are shown to exist using non-constructive techniques, several yield constructions of the required graphs, and one uses a randomized algorithm. The constructive algorithms would traditionally be preferred even though the processor requirements are higher. It is shown that the non- constructive algorithms can actually be used by generating the needed graphs using random number generators skewed appropriately. Department of Computer Science, University of Maryland,
Mathematical Modeling of Lateralization and Asymmetries in Cortical Maps. Svetlana Levitan. July 1999.
Recent experimental work in neurobiology has defined asymmetries and lateralization in the topographic maps found in mirror-image regions of the sensorimotor cerebral cortex. However, the mechanisms underlying these asymmetries are currently not established, and in some cases are quite controversial. In order to explore some possible causes of map asymmetry and lateralization, several neural network models of cortical map lateralization and asymmetries based on self-organizing maps are created and studied both computationally and theoretically. Activation levels of the elements in the models are governed by large systems of highly nonlinear ordinary differential equations (ODEs), where coefficients change with time and their changes depend on the activation levels. Special metrics for objective evaluation of simulation results (represented as paired receptive field maps) are introduced and analysed. The behavior of the models is studied when their parameters are varied systematically and also when simulated lesions are introduced into one of the hemispheric regions. Some very sharp transitions and other interesting phenomena have been found computationally. Many of these computationally observed phenomena are explained by theoretical analysis of total hemispheric activation in a simplified model. The connection between a bifurcation point of the system of ODEs and the sharp transition in the model's computational behavior is established. More general understanding of topographic map formation and changes under various conditions is achieved by analysis of activation patterns (i.e., $\omega$-limit sets of the above system of ODEs). This is the first mathematical model to demonstrate spontaneous map lateralization and asymmetries, and it suggests that such models may be generally useful in better understanding the mechanisms of cerebral lateralization. The mathematical analysis of the models leads to a better understanding of the mechanisms of self-organization in the topographic maps based on competitive distribution of activation and competitive learning. Also cross-referenced as UMIACS-TR-99-40 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Multigrid Method Enhanced by Krylov Subspace Iteration for Discrete. Howard C. Elman. Oliver G. Ernst. Dianne P. O'Leary. June 1999.
Standard multigrid algorithms have proven ineffective for the solution of discretizations of Helmholtz equations. In this work we modify the standard algorithm by adding GMRES iterations at coarse levels and as an outer iteration. We demonstrate the algorithm's effectiveness through theoretical analysis of a model problem and experimental results. In particular, we show that the combined use of GMRES as a smoother and outer iteration produces an algorithm whose performance depends relatively mildly on wave number and is robust for normalized wave numbers as large as two hundred. For fixed wave numbers, it displays grid-independent convergence rates and has costs proportional to number of unknowns. Also cross-referenced as UMIACS-TR-99-36 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Design of History Mechanisms and their Use in Collaborative Educational Simulations. Catherine Plaisant. Anne Rose. Gary Rubloff. Richard Salter. Ben Shneiderman. May 1999.
Reviewing past events has been useful in many domains. Videotapes and flight data recorders provide nvaluable technological help to sports coaches or aviation engineers. Similarly, providing learners with a readable recording of their actions may help them monitor their behavior, reflect on their progress, and experiment with revisions of their experiences. It may also facilitate active collaboration among dispersed learning communities. Learning histories can help students and professionals make more effective use of digital library searching, word processing tasks, computer-assisted design tools, electronic performance support systems, and web navigation. This paper describes the design space and discusses the challenges of implementing learning histories. It presents guidelines for creating effective implementations, and the design tradeoffs between sparse and dense history records. The paper also presents a first implementation of learning histories for a simulation-based engineering learning environment called SimPLE (Simulated Processes in a Learning Environment) for the case of a semiconductor fabrication module, and reports on early user evaluation of learning histories implemented within SimPLE. Also cross-referenced as UMIACS-TR-99-34 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Shared Memory Implementations of Synchronous Dataflow Specifications. Praveen K. Murthy. Shuvra S. Bhattacharyy. June 1999.
There has been a proliferation of block-diagram environments for specifying and prototyping DSP systems. These include tools from academia like Ptolemy [6], and commercial tools like SPW from Cadence Design Systems, and Cossap from Synopsys. The block diagram languages used in these environments are usually based on dataflow semantics because various subsets of dataflow have proven to be good matches for expressing and modeling signal processing systems. In particular, synchronous dataflow (SDF)[14] has been found to be a particularly good match for expressing multirate signal processing systems [5]. One of the key problems that arises during synthesis from an SDF specification is scheduling. Past work on scheduling [3] from SDF has focused on optimization of program memory and buffer memory. However, in [3], no attempt was made for overlaying or sharing buffers. In this paper, we formally tackle the problem of generating optimally compact schedules for SDF graphs, that also attempt to minimize buffering mem- ory under the assumption that buffers will be shared. This will result in schedules whose data memory usage is drastically lower than methods in the past have achieved. The method we use is that of lifetime analysis; we develop a model for buffer lifetimes in SDF graphs, and develop scheduling algorithms that attempt to generate schedules that minimize the maximum number of live tokens under the particular buffer lifetime model. We develop several efficient algorithms for extracting the relevant lifetimes from the SDF schedule. We then use the firstfit heuristic for packing arrays efficiently into memory. We report extensive experimental results on applying these techniques to several practical SDF systems, and show improvements that average 50% over previous techniques, with some systems exhibiting upto an 83% improvement over previous techniques. Also cross-referenced as UMIACS-TR-99-32 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Approximation Algorithms and Heuristics for the Dynamic Storage. Praveen K. Murthy. Shuvra S. Bhattacharyya. June 1999.
In this report, we look at the problem of packing a number of arrays in memory efficiently. This is known as the dynamic storage allocation problem (DSA) and it is known to be NP-complete. We develop some simple, polynomial-time approximation algorithms with the best of them achieving a bound of 4 for a sub-class of DSA instances. We report on an extensive experimental study on the FirstFit heuristic and show that the average-case performance on random instances is within 7% of the optimal value. Also cross-referenced as UMIACS-TR-99-31 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Symbiosis between Linear Algebra and Optimization. Dianne P. O'Leary. May 1999.
The efficiency and effectiveness of most optimization algorithms hinges on the numerical linear algebra algorithms that they utilize. Effective linear algebra is crucial to their success, and because of this, optimization applications have motivated fundamental advances in numerical linear algebra. This essay will highlight contributions of numerical linear algebra to optimization, as well as some optimization problems encountered within linear algebra that contribute to a symbiotic relationship. Also cross-referenced as UMIACS-TR-99-30 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Querying Very Large Multi-dimensional Datasets in ADR - Extended. Tahsin Kurc. Chialin Chang. Renato Ferreira. Alan Sussman. Joel Saltz. May 1999.
This paper addresses optimizing the execution of range queries into multi-dimensional datasets on distributed memory parallel machines within the Active Data Repository framework. ADR is an infrastructure that integrates storage, retrieval and processing of large multi-dimensional datasets on distributed memory parallel architectures with multiple disks attached to each node. We describe three potential strategies for efficient execution of such queries that employ different tiling and workload partitioning approaches. We evaluate scalability of these strategies for different application scenarios, varying both the number of processors and the input dataset size on a 128 processor IBM SP multicomputer. Also cross-referenced as UMIACS-TR-99-29 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Snap-Together Visualization: Coordinating Multiple Views to Explore. Chris North. Ben Shneiderman. June 1999.
Information visualizations with multiple coordinated views enable users to rapidly explore complex data and discover relationships. However, it is usually difficult for users to find or create the coordinated visualizations they need. Snap-Together Visualization allows users to coordinate multiple views that are customized to their needs. Users query their relational database and load results into desired visualizations. Then they specify coordinations between visualizations for selecting, navigating, or re-querying. Developers can make independent visualization tools 'snap-able' by including a few hooks. Also cross-referenced as UMIACS-TR-99-28 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Pixel Data Access for End-User Programming and Graphical Macros. Richard Potter. Ben Shneiderman. May 1999.
Pixel Data Access is an interprocess communication technique that enables users of graphical user interfaces to automate certain tasks. By accessing the contents of the display buffer, users can search for pixel representations of interface elements, and then initiate actions such as mouse clicks and keyboard entries. While this technique has limitations it offers users of current systems some unusually powerful features that are especially appealing in the area of end-user programming. Also cross-referenced as UMIACS-TR-99-27 University of Maryland Institute doe Advanced Computer Studies, Department of Computer Science, University of Maryland,
Architecture and Implementation of a Java Package for Multiple Input. Juan Pablo Hourcade. Benjamin B. Bederson. May 1999.
A major difficulty in writing Single Display Groupware (co-present collaborative) applications is getting input from multiple devices. We introduce MID, a Java package that addresses this problem and offers an architecture to access advanced events through Java. In this paper, we describe the features, architecture and limitations of MID. We also briefly describe an application that uses MID to get input from multiple mice: KidPad. Also cross-referenced as UMIACS-TR-99-26 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Analysis of the Rayleigh--Ritz Method for Approximating. Zhongxiao Jia. G. W. Stewart. May 1999.
This paper concerns the Rayleigh--Ritz method for computing an approximation to an eigenspace $\clx$ of a general matrix $A$ from a subspace $\clw$ that contains an approximation to $\clx$. The method produces a pair $(N, \tilde X)$ that purports to approximate a pair $(L, X)$, where $X$ is a basis for $\clx$ and $AX = XL$. In this paper we consider the convergence of $(N, \tilde X)$ as the sine $\epsilon$ of the angle between $\clx$ and $\clw$ approaches zero. It is shown that under a natural hypothesis\,---\,called the uniform separation condition\,---\,the Ritz pairs $(N, \tilde X)$ converge to the eigenpair $(L, X)$. When one is concerned with eigenvalues and eigenvectors, one can compute certain refined Ritz vectors whose convergence is guaranteed, even when the uniform separation condition is not satisfied. An attractive feature of the analysis is that it does not assume that $A$ has distinct eigenvalues or is diagonalizable. (Also cross-referenced as UMIACS-TR-99-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Jazz: An Extensible 2D+Zooming Graphics Toolkit in Java. Benjamin B. Bederson. Britt McAlister. May 1999.
Jazz is a new general-purpose toolkit that supports applications using zooming object-oriented 2D graphics. It is built entirely in Java using Java2D, and thus runs on all platforms that support Java 2. It supports zooming, internal cameras, and lenses in a similar style to Pad++, but does so in a general purpose manner without a specific focus on zooming. Jazz is primarily a "scenegraph" for 2D graphics that is analogous to Sun's Java3D and SGI's OpenInventor in their support for 3D scenegraphs. This paper describes Jazz and discusses the issues of using a scenegraph for 2D graphics. We discuss the Jazz architecture, and how applications can build on top of it. Also cross-referenced as UMIACS-TR-99-24 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Data Dissemination on the Web: Speculative and Unobtrusive. Vincenzo Liberatore. Brian D. Davison. May 1999.
The Web rapid growth results in heavier loads on servers/network and in increased latency experienced while retrieving Web documents. Internet traffic is further complicated by its burtiness, which complicates the design and allocation of network components. Bursty traffic alternates peak periods with lulls. The paper presents a framework that exploits idle periods in data traffic to satisfy future HTTP requests speculatively, opportunistically, and unobtrusively. Our proposal differs from previous schemes in that it is server-initiated and it is explicitly aware of current traffic loads (unobtrusive). This paper highlights several design trade-offs and details two issues: (1) server arbitration among several candidate documents, and (2) client/proxy caching. We present a theoretical analysis of arbitration, and we propose an integrated caching strategy for both requested and disseminated documents. Our approach is validated by extensive simulation on server logs, and substantial performance improvements are observed over pure on-demand strategies. (Also cross-referenced as UMIACS-TR-99-23) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Computation and Uses of the Semidiscrete Matrix Decomposition. Tamara G. Kolda. Dianne P. O'Leary. April 1999.
We derive algorithms for computing a semidiscrete approximation to a matrix in the Frobenius and weighted norms. The approximation is formed as a weighted sum of outer products of vectors whose elements are plus or minus $1$ or $0$, so the storage required by the approximation is quite small. We also present a related algorithm for approximation of a tensor. Applications of the algorithms are presented to data compression, filtering, and information retrieval; and software is provided in C and in Matlab. (Also cross-referenced as UMIACS-TR-99-22) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Adaptive Use of Iterative Methods in Predictor-Corrector Interior Point. Weichung Wang. Dianne P. O'Leary. April 1999.
In this work we devise efficient algorithms for finding the search directions for interior point methods applied to linear programming problems. There are two innovations. The first is the use of updating of preconditioners computed for previous barrier parameters. The second is an adaptive automated procedure for determining whether to use a direct or iterative solver, whether to reinitialize or update the preconditioner, and how many updates to apply. These decisions are based on predictions of the cost of using the different solvers to determine the next search direction, given costs in determining earlier directions. We summarize earlier results using a modified version of the OB1-R code of Lustig, Marsten, and Shanno, and we present results from a predictor-corrector code PCx modified to use adaptive iteration. If a direct method is appropriate for the problem, then our procedure chooses it, but when an iterative procedure is helpful, substantial gains in efficiency can be obtained. (Also cross-referenced as UMIACS-TR-99-21) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Kronos: A Java-Based Software System for the Processing and Retrieval. Zengyan Zhang. Joseph JaJa. David A. Bader. Satya Kalluri. Huiping Song. Nazmi Z El Saleous. Eric Vermote. John R. G. Townshend. March 1999.
At regional scales, satellite-based sensors are the primary source of information to study the earth's environment, as they provide the needed dynamic temporal view of the earth's surface. Raw satellite orbit data have to be processed and mapped into a standard projection to produce multitemporal data sets which can then be used for regional or global earth science studies, such as land cover dynamics, global carbon cycle, planetary-scale climate dynamics and deforestation. For a given sensor, different applications may require different processing chains with the same few core steps. Application dependent processing steps include atmospheric correction, spatial and temporal subsetting, and output image projection. However, the data sets that are currently available to the scientific community are generated using a predetermined processing chain in a fixed projection. Generating products that are different than the standard ones can be difficult and will result in at least a re-sampling step and hence some loss of accuracy. In this paper, we describe a software system Kronos for the generation of custom-tailored data products from the Advanced Very High Resolution Radiometer (AVHRR) sensor on board of the National Oceanic and Atmospheric Administration (NOAA) series polar orbiting satellites. Kronos allows the generation of a rich set of products that can be easily specified through a Java interface by scientists wishing to carry out earth system modeling or analysis based on Global Area Coverage (GAC) data from the AVHRR sensor. Kronos is based on a flexible methodology and consists of four major components: ingest and preprocessing, indexing and storage, search and processing engine, and a Java interface. We illustrate the power of our methodology by including a few special data products generated by Kronos. Also cross-referenced as UMIACS-TR-99-19 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Flexible User Profiles for Large Scale Data Delivery. Ugur Cetintemel. Michael J. Franklin. C. Lee Giles. March 1999.
Push-based data delivery requires knowledge of user interests for making scheduling, bandwidth allocation, and routing decisions. Such information is maintained as user profiles. We propose a new incremental algorithm for constructing user profiles based on monitoring and user feedback. In contrast to earlier approaches, which typically represent profiles as a single weighted interest vector, we represent user-profiles using multiple interest clusters, whose number, size, and elements change adaptively based on user access behavior. This flexible approach allows the profile to more accurately represent complex user interests. The approach can be tuned to trade off profile complexity and effectiveness, making it suitable for use in large-scale information filtering applications such as push-based WWW page dissemination. We evaluate the method by experimentally investigating its ability to categorize WWW pages taken from Yahoo! categories. Our results show that the method can provide high retrieval effectiveness with modest profile sizes and can effectively adapt to changes in users' interests. Also cross-referenced as UMIACS-TR-99-18 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Near-Optimal Parameters for Tikhonov and Other Regularization Methods. Dianne P. O'Leary. March 1999.
Choosing the regularization parameter for an ill-posed problem is an art based on good heuristics and prior knowledge of the noise in the observations. In this work we propose choosing the parameter, without a priori information, by approximately minimizing the distance between the true solution to the discrete problem and the family of regularized solutions. We demonstrate the usefulness of this approach for Tikhonov regularization and for an alternate family of solutions. Further, we prove convergence of the regularization parameter to zero as the standard deviation of the noise goes to zero. We also prove that the alternate family produces solutions closer to the true solution than the Tikhonov family when the noise is small enough. Also cross-referenced as UMIACS-TR-99-17 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
OSMA Software Program: Domain Analysis Guidebook. Victor R. Basili. Carolyn Seaman. Roseanne Tesoriero. Marvin V. Zelkowitz. December 1998.
Domain analysis is the process of identifying and organizing knowledge about a class of problems. This guidebook presents a method of performing experience domain analysis in software development organizations. The purpose of the guidebook is to facilitate the reader in characterizing two given development environments, applying domain analysis to model each, and then applying an evaluation process, based upon the Goal/Metric/Paradigm, to transfer a given development technology from one of the environments to the other. This guidebook describes this process and gives an example of its use within NASA. Also cross-referenced as UMIACS-TR-99-16 University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
Query Planning for Range Queries with User-defined Aggregation on. Chialin Chang. Tahsin Kurc. Alan Sussman. Joel Saltz. February 1999.
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is usually highly stylized, with the basic processing steps consisting of (1) retrieval of a subset of all available data in the input dataset via a range query, (2) projection of each input data item to one or more output data items, and (3) some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on shared-nothing architectures. In this paper we address query planning and execution strategies for range queries with user-defined processing. We evaluate three potential query planning strategies within the ADR framework under several application scenarios, and present experimental results on the performance of the strategies on a multiprocessor IBM SP2. (Also cross-refereced as UMIACS-TR-99-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Does Zooming Improve Image Browsing?. Tammara T.A. Combs. Benjamin B. Bederson. March 1999.
We describe an image retrieval system we built based on a Zoomable User Interface (ZUI). We also discuss the design, results and analysis of a controlled experiment we performed on the browsing aspects of the system. The experiment resulted in a statistically significant difference in the interaction between number of images (25, 75, 225) and style of browser (2D, ZUI, 3D). The 2D and ZUI browser systems performed equally, and both performed better than the 3D systems. The image browsers tested during the experiment include Cerious Software's Thumbs Plus, TriVista Technology's Simple LandScape and Photo GoRound, and our Zoomable Image Browser based on Pad++. Also cross-referenced as UMIACS-TR-99-14 University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
XJoin: Getting Fast Answers From Slow and Bursty Networks. Tolga Urhan. Michael J. Franklin. February 1999.
The combination of increasingly ubiquitous Internet connectivity and advances in heterogeneous and semi-structured databases has the potential to enable database-style querying over data from sources distributed around the world. Traditional query processing techniques, however, fail to deliver acceptable performance in such a scenario for two main reasons: First, they optimize for delivery of the entire query result, while on-line users would typically benefit from receiving initial results as quickly as possible. Second, slow or bursty delivery of data from remote sources can stall query execution, making the already inadequate batch-like behavior even worse. Both of these problems can be addressed using fully pipelined query execution. The symmetric hash join operator supports such pipelining, but it requires all base data and intermediate results to be memory-resident, which is unacceptable for complex queries over large datasets. In this paper we present a multi-threaded extension of the symmetric hash join, called XJoin, that can execute effectively with far less memory. By reactively scheduling background processing, XJoin hides intermittent delays in data arrival to produce more tuples earlier. XJoin includes a very efficient, on-the-fly algorithm for preventing duplicates from being created by its independently running threads. We have implemented the XJoin operator and added it to the PREDATOR Object-Relational DBMS. Using this implementation along with traces obtained by monitoring Internet data delivery, we show that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources. (Also cross-referenced as UMIACS-TR-99-13) University of Maryland Institute for Advanced Computer studies, Department of Computer Science, University of Maryland,
Visualizing Digital Library Search Results with Categorical and. Ben Shneiderman. David Feldman. Anne Rose. February 1999.
Digital library search results are usually shown as a textual list, with 10-20 items per page. Viewing several thousand search results at once on a two-dimensional display with continuous variables is a promising alternative. Since these displays can overwhelm some users, we created a simplified two-dimensional display that uses categorical and hierarchical axes, called hieraxes. Users appreciate the meaningful and limited number of terms on each hieraxis. At each grid point of the display we show a cluster of color-coded dots or a bar chart. Users see the entire result set and can then click on labels to move down a level in the hierarchy. Handling broad hierarchies and arranging for imposed hierarchies led to additional design innovations. We applied hieraxes to a digital video library used by middle school teachers and a legal information system. (Also cross-referenced as UMIACS-TR-99-12) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A New Method to Store and Retrieve Images. Zhexuan Song. Nick Roussopoulos. February 1999.
In this paper, we present a method to accelerate the speed of querying and retrieving images in database. First we change the storing method: pixels of an image are saved in Hilbert order instead of Row-wise order using in traditional method. Then after studying the property of Hilbert curve, we give a new algorithm which greatly reduce the data segment number on the disk. Although we have to retrieve more data than necessary, because the speed of sequential reading is much faster than random reading, we have about 10% improvement on the total query time which is showed in our simulation experiments. Department of Computer Science, University of Maryland,
Understanding Patterns of User Visits to Web Sites: Interactive. Harry Hochheiser. Ben Shneiderman. February 1999.
HTTP server log files provide Web site operators with substantial detail regarding the visitors to their sites. Interest in interpreting this data has spawned an active market for software packages that summarize and analyze this data, providing histograms, pie graphs, and other charts summarizing usage patterns. While useful, these summaries obscure useful information and restrict users to passive interpretation of static displays. Interactive starfield visualizations can be used to provide users with greater abilities to interpret and explore web log data. By combining two-dimensional displays of thousands of individual access requests, color and size coding for additional attributes, and facilities for zooming and filtering, these visualizations provide capabilities for examining data that exceed those of traditional web log analysis tools. We introduce a series of interactive starfield visualizations, which can be used to explore server data across various dimensions. Possible uses of these visualizations are discussed, and difficulties of data collection, presentation, and interpretation are explored. (Also cross-referenced as UMIACS-99-11) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Creating Creativity for Everyone: User Interfaces for Supporting. Ben Shneiderman. February 1999.
A challenge for human-computer interaction researchers and user interface designers is to construct information technologies that support creativity. This ambitious goal can be attained by building on an adequate understanding of creative processes. This paper offers the four-phase genex framework for generating excellence: - Collect: learn from previous works stored in digital libraries - Relate: consult with peers and mentors at early, middle and late stages - Create: explore, compose, and evaluate possible solutions - Donate: disseminate the results and contribute to the digital libraries Within this integrated framework, this paper proposes eight activities that require human-computer interaction research and advanced user interface design. A scenario about an architect illustrates the process of creative work within a genex environment. (Also cross-referenced as UMIACS-TR-9910) University of Maryland, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Probabilistic Temporal Databases, I: Algebra. Alex Dekhtyar. Robert Ross. V. S. Subrahmanian. January 1999.
Dyreson and Snodgrass have drawn attention to the fact that in many temporal database applications, there is often uncertainty present about the start time of events, the end time of events, the duration of events, etc. When the granularity of time is small (e.g. milliseconds), a statement such as "Packet p was shipped sometime during the first 5 days of January, 1998" leads to a massive amount of uncertainty (5 times 24 times 60 times 60 times 1000) possibilities. As noted by Zaniolo et. al., past attempts to deal with uncertainty in databases have been restricted to relatively small amounts of uncertainty in attributes. Dyreson and Snodgrass have taken an important first step towards solving this problem. In this paper, we first introduce the syntax of Temporal-Probabilistic (TP) relations and then show how they can be converted to an explicit, significantly more space-consuming form called Annotated Relations. We then present a {\em Theoretical Annotated Temporal Algebra} (TATA). Being explicit, TATA is convenient for specifying how the algebraic operations should behave, but is impractical to use because annotated relations are overwhelmingly large. Next, we present a Temporal Probabilistic Algebra (TPA). We show that our definition of the TP-Algebra provides a correct implementation of TATA despite the fact that it operates on implicit, succinct TP-relations instead of the overwhelmingly large annotated relations. Finally, we report on timings for an implementation of the TP-Algebra built on top of ODBC. (Also cross-referenced as UMIACS-TR-99-09) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Convergence of Ritz Values, Ritz Vectors, and Refined Ritz. Zhongxiao Jai. G. W. Stewart. January 1999.
This paper concerns the Rayleigh--Ritz method for computing an approximation to an eigenpair $(\lambda, x)$ of a non-Hermitian matrix $A$. Given a subspace $\clw$ that contains an approximation to $x$, this method returns an approximation $(\mu, \tilde x)$ to $(\lambda, x)$. We establish four convergence results that hold as the deviation $\epsilon$ of $x$ from $\clw$ approaches zero. First, the Ritz value $\mu$ converges to $\lambda$. Second, if the residual $A\tilde x-\mu\tilde x$ approaches zero, then the Ritz vector $\tilde x$ converges to $x$. Third, we give a condition on the eigenvalues of the Rayleigh quotient from which the Ritz pair is computed that insures convergence of the Ritz vector. Finally, we show that certain unconditionally. (Also cross-referenced as UMIACS-TR-99-08) University of Maryland Institute for Advanced Studies, Department of Computer Science, University of Maryland,
Guaranteeing Safety in the Presence of Moving Obstacles. Robert Kohout. January 1999.
Path planning is a fundamental problem in robotics research. Whether the robot is a manipulator arm in a factory floor, an unmanned all-terrain vehicle, a flying drone, or a household assistant serving coffee, the motions of the robot must be planned and executed in such a way that the robot can accomplish its goals. Motion planning must take into account the robot's inherent abilities to move and maneuver, its speed, and all of the various constraints imposed upon these abilities by the environment in which the robot is situated. Many real-world application domains are dynamic, in the sense that the plan-relevant parameters in the environment evolve over time. In such cases, motion planning must also take into account the time that it takes to plan. A perfect plan is useless if it cannot be produced in time to execute it in a changing world. This technical report focuses upon the problem of avoiding moving obstacles in a 2-dimensional environment. Specifically, it addresses the problem of guaranteeing that a robot will never be hit by an obstacle in the environment. It establishes conditions for guaranteeing that a safety-preserving path will always exist in the most commonly studied problem in moving obstacle avoidance, known as the Asteroids Avoidance Problem. These results are then extended to less restricted, more realistic variants of the problem, including the important case where the locations and trajectories are only made known to the planning algorithm at runtime. Once these conditions are established, they are used to develop an incremental algorithm that can solve the restricted Asteroids problem in low-order polynomial time. This algorithm takes its own observed worst-case running time into account, completes in a fraction of a second, and has been used to control Dodger, a simulated robot that avoids moving obstacles in hard real time. In over ten machine-weeks of testing, involving well over a million obstacles generated in a variety of ways, Dodger has not been hit by a single obstacle. (Also cross-referenced as UMIACS-TR 99-06) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Building Knowledge through Families of Software Studies: An Experience. Victor Basili. Forrest Shull. Filippo Lanubile. January 1999.
Experimentation in software engineering is difficult. One reason is that there are a large number of context variables, and so creating a cohesive understanding of experimental results requires a mechanism for motivating studies and integrating results. It requires a community of researchers that can replicate studies, vary context variables, and build abstract models that represent the common observations about the discipline. This paper discusses the experience of the authors, based upon a collection of experiments, in terms of a high level framework for organizing sets of related studies. With such a framework, experiments can be viewed as part of common families of studies, rather than being isolated events. Common families of studies can contribute to higher level hypotheses that no individual experiment could achieve. Then the replication of experiments within a family of studies can act as the cornerstone for building knowledge in an incremental manner. A mechanism is suggested that motivates, records, and integrates individual experiments within a family for analysis by the community at large. To support the framework, this paper discusses the experiences of the authors in carrying out empirical studies, with specific emphasis on persistent problems encountered in experimental design, threats to validity, criteria for evaluation, and execution of experiments in the domain of software engineering. (Also cross-referenced as UMIACS-TR-99-05) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
SHOP: Simple Hierarchical Ordered Planner. Dana Nau. Yue Cao. Amnon Lotem. Hector Munoz-Avia. January 1999.
SHOP (Simple Hierarchical Ordered Planner) is a domain-independent HTN Planning system with the following characteristics. * SHOP plans for tasks in the same order that they will later be executed. This avoids some of the goal-interaction issues that arise in other HTN planners, thus making the planning algorithm relatively simple. * The planning algorithm is sound and complete over a large class of problems. * Since SHOP knows the complete world-state at each step of the planning process, it can use highly expressive domain representations. For example, it can do planning problems that require complex numeric computations. * In our tests, SHOP solved problems several orders of magnitude faster than Blackbox and TLplan. This occured even though SHOP is written in Lisp and the other planners are written in C. (Also cross-referenced as UMIACS-TR 99-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Coding Discourse Structure in Dialogue (Version 1.0). Christine Nakatani. David Traum. March 1999.
This document is a manual for coding aspects of the discourse structure of dialogue. It was developed to serve as both as a starting point for discussion and a tool for coding exercises prior to the 3rd {\em Discourse Resource Initiative} (DRI) meeting, May 1998 in Chiba, Japan. The manual focuses on coding common ground units (CGUs) to get to a level of commonality between participants in dialogue, and then intentional and informational units (IUs) that represent the higher-level, hierarchical topic or purpose structure of dialogue. (Also cross-referenced as UMIACS-TR-99-03) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Estimating Available Capacity of an End-to-end Path in a Computer Network. Shikha Bahl. December 1998.
A common measure for characterizing the quality of a connection is in terms of average bandwidth, where averages reflect long-term behavior of the connection. It is well recognized, however, that the performance of a connection changes rapidly with time. In order to address the dynamic nature of the connection, a better measure is in terms of the capacity available to the user, treating the capacity as a time varying function. Techniques available for determining the capacity of a path require that a series of packets be sent at a rate that saturates the path for lengthy periods of time. In contrast, we present a non-stressful technique. Estimation of the available capacity requires the knowledge of the way a connection behaves. In order to reflect the actual operations of the network resources, deterministic models are presented, which make the estimation of the available capacity feasible. We adopt the approach of monitoring a connection while sending a controlled set of packets according to the probe packet train model and measuring the time it takes for the probe packets to go across a connection. Based on these measurements, and the deterministic models, we estimate the available capacity of a connection during the observation period. The applicability of the techniques developed for estimating the available capacity were tested through experimental studies using NetDyn for measurements, and selected Internet sites as end points of the connections. The results of the experimental study are also presented. (Also cross-referenced as UMIACS-TR-99-02) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A study of Cyclone technology. Sung Lee. January 1999.
Since their advent, computer networks have used event-based mechanisms for managing resources. While technological advances have resulted in computer networking becoming ubiquitous, the performance of the networks suffers from these approaches of resource management. Cyclone technology, on the other hand, manages resources in a time-based manner, resulting in a networking technology which can deliver loss free, contention free, and jitter free data in a very efficient manner. In Cyclone, scheduled traffic reserves the use of resources in time and space at the time of establishing the connection. As a consequence, there are no losses, jitters, or contentions for any resources. This technology also supports on-demand traffic, for which available resources are allocated on-demand without affecting the performance of scheduled traffic and leading to higher resource utilization. The scheduling approach used indicates that the links can sustain very high loading without having any adverse impact on performance of the scheduled traffic. Clearly the time coordination among resources is the key in achieving jitter free and loss free computer communication with minimum end-to-end delay. Cyclone technology exploits such coordinations of resources in time and space and requires minimal processing at a node during data transfer. It eliminates the need for carrying header information allowing more efficient utilization of existing communication bandwidth. The problems of congestion and loss are removed through end-to-end time coordination among network components, thus leading to fewer control messages. For traffic with stringent timing requirements such as real-time audio and video, Cyclone technology offers well-suited network environments in which the end-to-end delay and jitter can be controlled and guaranteed. In this disseration, we present end-to-end design aspects and the feasibility of Cyclone technology. A design is presented for all aspects including components and scheduling, and the modes of operations in a Cyclone network have been considered. Our study on the behavior of the current scheduling technique shows that the connection acceptance probability is very high, link utilization can be close to 100%, and the worst case delays due to scheduling is rather low. (Also cross-referenced as UMIACS-TR-99-01) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
High Performance Computing Algorithms for Land mCover Dynamics Using. Satya Kalluri. Joseph Ja'Ja'. David A. Bader. Zengyan Zhang. John Townshend. Hassan Fallah-adl. December 1998.
Global and regional land cover studies require the ability to apply complex models on selected subsets of large amounts of multi-sensor and multi-temporal data sets that have been derived from raw instrument measurements using widely accepted pre-processing algorithms. The computational and storage requirements of most such studies far exceed what is possible on a single workstation environment. We have been pursuing a new approach that couples scalable and open distributed heterogeneous hardware with the development of high performance software for processing, indexing, and organizing remotely sensed data. Hierarchical data management tools are used to ingest raw data, create metadata, and organize the archived data so as to automatically achieve computational load balancing among the available nodes and minimize I/O overheads. We illustrate our approach with four specific examples. The first is the development of the first fast operational scheme for the atmospheric correction of Landsat TM scenes, while the second example focuses on image segmentation using a novel hierarchical connected components algorithm. Retrieval of global BRDF (Bidirectional Reflectance Distribution Function) in the red and near infrared wavelengths using four years (1983 to 1986) of Pathfinder AVHRR Land (PAL) data set is the focus of our third example. The fourth example is the development of a hierarchical data organization scheme that allows on-demand processing and retrieval of regional and global AVHRR data sets. Our results show that substantial improvements in computational times can be achieved by using the high performance computing technology. (Also cross-referenced as UMIACS-TR-98-18) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Evaluation of Client-Server Architectures. Michael D. Beynon. Renato Ferreira. Asmara Afework. Ganti Krishna Mohan. December 1998.
No abstract available. Also cross-referenced as UMIACS-TR-98-17 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hybrid Probabilistic Programs: Algorithms and Complexity. Michael Dekhtyar. Alex Dekhtyar. V.S. Subrahmanian. December 1998.
Hybrid Probabilistic Programs (HPPs) are logic programs that allow the programmer to explicitly encode his knowledge of the dependencies between events being described in the program. In this paper, we classify HPPs into three classes called HPP_1,HPP_2 and HPP_r for r >= 3. For these classes, we provide three types of results for HPPs. First, we develop algorithms to compute the set of all ground consequences of an HPP. Then we provide algorithms and complexity results for the problems of entailment (``Given an HPP P and a query Q as input, is Q a logical consequence of P?'') and consistency (``Given an HPP P as input, is P consistent?''). Our results provide a fine characterization of when polynomial algorithms exist for the above problems, and when these problems become intractable. (Also cross-referenced as UMIACS-TR-98-76) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Single Display Groupware: A Model for Co-present Collaboration. Jason Stewart. Benjamin B. Bederson. Allison Druin. December 1998.
We introduce a model for supporting collaborative work between people that are physically close to each other. We call this model Single Display Groupware (SDG). In this paper, we describe this model, comparing it to more traditional remote collaboration. We describe the requirements that SDG places on computer technology, and our understanding of the benefits and costs of SDG systems. Finally, we describe a prototype SDG system that we built and the results of a usability test we ran with 60 elementary school children. (Also cross-referenced as UMIACS-TR-98-75) University of Maryland Instsitute for Advacned Computer Studies, Department of Computer Science, University of Maryland,
Does a Sketchy Appearance Influence Drawing Behavior?. Jon Meyer. Benjamin B. Bederson. December 1998.
In this paper we examine the role of visual aesthetics in how people interact with computers. Specifically, we are interested in whether simply adopting a sketch-like visual appearance in a drawing application encourages users to interact with the application more freely or rapidly than they would if they were using the standard, precise, rectilinear appearance that most drawing applications now supply. We carried out two user studies. In the first study, we asked members of the University of Maryland Art History department to draw a series of diagrams using two different line styles. In the second experiment, we used the World Wide Web to collect drawing diagrams from a much broader set of participants. Both studies reveal that subjects draw more quickly using the sketch-like ('wavy') line style than the straight line style. (Also cross-referenced as UMIACS-TR-98-74) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Does Animation Help Users Build Mental Maps of Spatial Information?. Benjamin B. Bederson. Angela Boltman. December 1998.
We examine how animating a viewpoint change in a spatial information system affects a user's ability to build a mental map of the information in the space. We found that animation improves users' ability to reconstruct the information space, with no penalty on task performance time. We believe that this study provides strong evidence for adding animated transitions in many applications with fixed spatial data where the user navigates around the data space. (Also cross-referenced as UMIACS-TR-98-73) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Survey of Current Paradigms in Machine Translation. Bonnie J. Dorr. Pamela W. Jordan. John W. Benoit. December 1998.
This is paper is a survey of the current machine translation research in the US, Europe, and Japan. A short history of machine translation is presented first, followed by an overview of the current research work. Representative examples of a wide range of different approaches adopted by machine translation researchers are presented. These are described in detail along with a discussion of the practicalities of scaling up these approaches for operational environments. In support of this discussion, issues in, and techniques for, evaluating machine translation systems are discussed. Also cross-referenced as UMIACS-TR-98-72) University of Maryland Institute for Advanced Computer Science, Department of Computer Science, University of Maryland,
Caching and Scheduling for Broadcast Disk Systems. Vincenzo Liberatore. December 1998.
Unicast connections lead to performance and scalability problems when a large client population attemps to access the same data. Broadcast push and broadcast disk technology address the problem by broadcasting data items from a server to a large number of clients. Broadcast disk performance depends mainly on caching strategies at the client site and on how the broadcast is scheduled at the server site. An on-line broadcast disk paging strategy makes caching decisions without knowing access probabilities. In this paper, we subject on-line paging algorithms to extensive empirical investigation. The Gray algorithm [KL98] always outperformed other on-line strategies on both synthetic and Web traces. Moreover, caching limited the skewness needed from a broadcast schedule, and led to favor efficient caching algorithms over refined scheduling strategies when the cache was not small. Prior to this paper, no work had empirically investigated on-line paging algorithm and their relation with server scheduling. (Also cross-referenced as UMIACS-TR-98-71) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Evaluation of Architectural Alternatives for Rapidly Growing. Mustafa Uysal. Anurag Acharya. Joel Saltz. November 1998.
Growth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Evaluation of Architectural Alternatives for Rapidly Growing. Mustafa Uysal. Anurag Acharya. Joel Saltz. November 1998.
Growth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
MOCHA: A Self-Extensible Middleware Substrate for Distributed Data. Manuel Rodriguez-Martinez. Nick Roussopoulos. November 18, 1998.
This paper describes MOCHA, a self-extensible middleware substrate designed to interconnect data sources distributed over a computer network. MOCHA is designed to scale to large environments and is based on the idea that the functionality in the system should be deployed by the middleware itself. This is realized by shipping the code implementing either advanced data types or tailored query operators to remote data sources and have it executed remotely. Optimized query plans push the evaluation of powerful data-reducing operators to the data sites while executing data-inflating operators at the client's site. The Volume Reduction Factor is a new cost metric introduced to select the best site to execute query operators and is shown to be more accurate than the standard selectivity factor. MOCHA has been implemented in Java and runs on top of the Informix Universal Server. In this paper we present the architecture of MOCHA, the ideas behind it, and a performance study using data and queries from the Sequoia 2000 Benchmark. The results of this study demonstrate that MOCHA not only provides a flexible and scalable framework but also substantially improves query performance in contrast to traditional middleware solutions. (Also cross-referenced as UMIACS-TR-98-67) University of Maryland Institute for Advanced Computer Studies), Department of Computer Science, University of Maryland,
A Performance Evaluation of Online Warehouse Update Algorithms. Alexandros Labrinidis. Nick Roussopoulos. November 1998.
Data warehouse maintenance algorithms usually work off-line, making the warehouse unavailable to users. However, since most organizations require continuous operation, we need be able to perform the updates online, concurrently with user queries. To guarantee that user queries access a consistent view of the warehouse, online update algorithms introduce redundancy in order to store multiple versions of the data objects that are being changed. In this paper, we present an online warehouse update algorithm, that stores multiple versions of data as separate rows (vertical redundancy). We compare our algorithm to another online algorithm that stores multiple versions within each tuple by extending the table schema (horizontal redundancy). We have implemented both algorithms on top of an Informix Dynamic Server and measured their performance under varying workloads, focusing on their impact on query response times. Our experiments show that, except for a limited number of cases, vertical redundancy is a better choice, with respect to storage, implementation overhead, and query performance. (Also cross-referenced as UMIACS TR-98-66) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Study of Permutations Permissible by LIFO Service Disciplines. Simon Hawkin. Ashok Agrawala. November 1998.
We study permutations of the job order performed by various LIFO service disciplines. The sets of such permutations are shown to be equivalent to sets of string permutations with simple characteristics. In particular, it is easy to test whether a given permutation belongs to these sets. Several algorithms that efficiently perform such tests are presented. (Also cross-referenced as UMIACS-TR-98-65) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Learning Response Time for WebSources using Query Feedback and. Jean-Robert Gruser. Louiqa Raschid. Vladimir Zadorozhny. November 1998.
The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current optimization technology for wrapper mediator architectures needs to be extended to estimate the response time (delays) to access WebSources and to use this delay in query optimization. In this paper, we present a Multi-Dimensional Table (MDT), a tool that is based on learning using query feedback from WebSources. We describe the MDT learning algorithms, and report on the MDT learning for WebSources. The MDT uses dimensions Time of day, Day, and Quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay), and a confidence in this prediction, for some query. Experiment data was collected from several WebSources and analyzed, to determine those dimensions that were significant in estimating the response time for particular WebSources. Our research shows that we can improve the quality of learning by tuning the MDT features, e.g., including significant dimensions in the MDT, or changing the ordering of dimensions. We then demonstrate how the MDT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the MDT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where MDT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and MDT learning, we test how good the MDT is in minimizing these penalties. Also cross-referenced as UMIACS TR #98-64 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Impact of Proxies in Data Intensive Client-Server Parallel. Michael D. Beynon. Alan Sussman. Joel Saltz. November 1998.
Large client-server data intensive applications can place high demands on system and network resources. This is especially true when the connection between the client and server spans a wide-area internet link. In this paper, we consider changing the typical client-server architecture of a class of data intensive applications. We show that given sufficient common interest among multiple clients, our enhancements reduce the response time per-client and reduce the amount of data sent across the wide-area link. In addition, we also see a reduction in server utilization which helps to improve server scalability as more clients are added to the system. (Also cross-referenced as UMIACS-TR-98-70) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Translating IDEF3 to PSL. Mihai Ciocoiu. November 1998.
This document describes the process of integrating IDEF3 and PSL. The EPIF like frame representation developed for representing IDEF3 schematics is introduced, together with the compilation rules for the various IDEF3 elements. The appendix contains a full example of the use of the translator for the Camile scenario. Also cross-referenced as a UMIACS-TR-98-63 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Three Results on Iterative Regularization. Misha Kilmer. G. W. Stewart. October 1998.
In this paper we present three theorems which give insight into the regularizing properties of {\minres}. While our theory does not completely characterize the regularizing behavior of the algorithm, it provides a partial explanation of the observed behavior of the method. Unlike traditional attempts to explain the regularizing properties of Krylov subspace methods, our approach focuses on convergence properties of the residual rather than on convergence analysis of the harmonic Ritz values. The import of our analysis is illustrated by two examples. In particular, our theoretical and numerical results support the following important observation: in some circumstances the dimension of the optimal Krylov subspace can be much smaller than the number of the components of the truncated spectral solution that must be computed to attain comparable accuracy. Also cross-referenced as UMIACS-TR-98-62 University of Maryland Institute for Advanced Computer Studies, De[artment of Computer Science, University of Maryland,
A Performance Study of Dynamic Replication Techniques in Continuous. ChengFu Chou. Leana Golubchik. John C.S. Lui. October 1998.
Multimedia applications are emerging in education, information dissemination, entertainment, as well as many other applications. The stringent requirements of such applications make design of cost-effective and scalable systems difficult, and therefore efficient adaptive and dynamic resource management techniques can be of great help in improving resource utilization and consequently improving performance and scalability of such systems. In this paper, we focus on threshold-based policies, for dynamic resource management, and specifically, in the context of continuous media (CM) servers. Furthermore, we propose a mathematical model of user behavior and show, through a performance study, that not only does the use of this model in conjunction with dynamic resource management policies improves the system's performance but that it also facilitates significantly reduced sensitivity to changes in: (a) system architecture, (b) workload characteristics, (c) skewness of data access patterns, (d) frequency of changes in data access patterns, and (e) choice of threshold values. We believe that not only is this a desirable property for a CM server, in general, but that furthermore, it suggests the usefulness of these techniques across a wide range of continuous media applications. Also cross-referenced as UMIACS-TR-98-61 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Understanding Transportation Management Systems Performance with a. Catherine Plaisant. Phil Tarnoff. Aditya Saraf. Anne Rose. November 1998.
We have developed a simulation-based learning environment to provide system designers and operators with an appreciation of the impact of incidents on traffic delay. We used an application framework developed at the University of Maryland for constructing simulation-based learning environments called SimPLE (Simulated Processes in a Learning Environment). Environments developed with SimPLE use dynamic simulations and visualizations to represent realistic time-dependent behavior and are coupled with guidance material and other software aids that facilitate learning. The simulation allows learners to close freeway lanes and divert traffic to an arterial road. Users can see the effect of the detour on freeway and arterial delay. Users can then adjust signal timing interactively on a time space diagram and watch the effect of their adjustment on green band changes and on arterial delays and total delays. Department of Computer Science, University of Maryland,
Excentric Labeling: Dynamic Neighborhood Labeling for Data. Jean-Daniel Fekete. Catherine Plaisant. October 1998.
The widespread use of information visualization is hampered by the lack of effective labeling techniques. A taxonomy of labeling methods is proposed. We then describe "excentric labeling", a new dynamic technique to label a neighborhood of objects located around the cursor. This technique does not intrude into the existing interaction, it is not computationally intensive, and was easily applied to several visualization applications. A pilot study indicates a strong speed benefit for tasks that involve the rapid exploration of large numbers of objects. Also cross-referenced as UMIACS-TR-98-59 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Iterative Methods for Stabilized DiscreteConvection--Diffusion. Yin-Tzer Shih. Howard C. Elman. October 1998.
In this paper, we study the computational cost of solving the convection-diffusion equation using various discretization strategies and iteration solution algorithms. The choice of discretization influences the properties of the discrete solution and also the choice of solution algorithm. The discretizations considered here are stabilized low order finite element schemes using streamline diffusion, crosswind diffusion and shock--capturing. The latter, shock--capturing discretizations lead to nonlinear algebraic systems and require nonlinear algorithms. We compare various preconditioned Krylov subspace methods including Newton--Krylov methods for nonlinear problems, as well as several preconditioners based on relaxation and incomplete factorization. We find that although enhanced stabilization based on shock--capturing requires fewer degrees of freedom than linear stabilizations to achieve comparable accuracy, the nonlinear algebraic systems are more costly to solve than those derived from a judicious combination of streamline diffusion and crosswind diffusion. Solution algorithms based on GMRES with incomplete block--matrix factorization preconditioning are robust and efficient. (Also cross-referenced as UMIACS-TR-98-58) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Automated Techniques for Designing Embedded Signal Processors on. Dong-In Kang. Richard Gerber. Leana Golubchik. October 1998.
In this paper, we present a performance-based technique to help synthesize high-bandwidth radar processors on commodity platforms. This problem is innately complex, for a number of reasons. Contemporary radars are very compute-intensive: they have high pulse rates, and they sample a large amount of range readings at each pulse. Indeed, modern radar processors can require CPU loads of in high-gigaflop to tera-flop ranges, performance which is only achieved by exploiting the radar's inherent data parallelism. Next-generation radars are slated to operate on scalable clusters of commodity systems. Throughput is only one problem. Since radars are usually embedded within larger real-time applications, they also must adhere to latency (or deadline) constraints. Building an embedded radar processor on a network of workstations (or a NOW) involves partitioning load in a balanced fashion, accounting for stochastic effects injected on all software-based systems, synthesizing runtime parameters for the on-line schedulers and drivers, and meeting the latency and throughput constraints. In this paper, we show how performance analysis can be used as an effective tool in the design loop; specifically, our method uses analytic approximation techniques to help synthesize efficient designs for radar processing systems. In our method, the signal-processor's topology is represented via a simple flow-graph abstraction, and the per-unit load requirements are modeled stochastically, to account for second-order effects like cache memory behavior, DMA interference, pipeline stalls, etc. Our design algorithm accepts the following inputs: (a)~the system topology, including the thread-to-CPU mapping, where multi-threading is assumed to be used; (b) the per-task load models; and (c) the required pulse rate and latency constraints. As output, it produces the proportion of load to allocate to each task, set at manageable time resolutions for the local schedulers; an optimal service interval over which all load proportions should be guaranteed; an optimal sampling frequency; and some reconfiguration schemes to accommodate single-node failures. Internally, the design algorithms use analytic approximations to quickly estimate output rates and propagation delays for candidate solutions. When the system is synthesized, its results are checked via a simulation model, which removes many of the analytic approximations. We show how our system synthesizes a real-time synthetic aperture radar, under a variety of loading conditions. Also cross-referenced as UMIACS TR # 98-57 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
LifeLines: Using Visualization to Enhance Navigation and Analysis of. Catherine Plaisant. Richard Mushlin. Aaron Snyder. Jia Li. Dan Heller. Ben Shneiderman. October 1998.
LifeLines provide a general visualization environment for personal histories. We explore its use for clinical patient records. A Java user interface is described, which presents a one-screen overview of a computerized patient record using timelines. Problems, diagnoses, test results or medications can be represented as dots or horizontal lines. Zooming provides more details; line color and thickness illustrate relationships or significance. The visual display acts as a giant menu, giving direct access to the data. (Also cross-referenced as UMIACS-TR-98-56) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Decentralized Replication Mechanisms in Deno. Peter J. Keleher. October 1998.
We are currently finalizing the design of Deno, a new shared-object system intended for use with repli-cated mobile and wide-area data. The broad aim of our research is to develop a framework for highly-available, decentralized shared-object protocols. The key idea is that our protocols will support high availability through a distributed voting scheme. Specifically, we will investigate (a) peer-to-peer up-dates, which will allow incremental progress to be made in the absence of full connectivity between com-ponent servers, (b) voting rather than centralized schemes for committing updates, ensuring that no sin-gle point of failure can prevent updates from being committed, and (c) application-specific consistency control, allowing applications to relax coherency constraints in ways that do not break the application's notion of consistency. Distribution and multiple connectivity modes are becoming the norm rather than the exception in current computing environments. Thus, we expect the impact of our research to be felt in areas as disparate as mobile computing and collaborative data warehousing on the Internet. (Also cross-referenced as UMIACS-TR-98-54) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Minimalist Theory of Human Sentence Processing. Amy Weinberg. October 1998.
Research in the theory of human sentence processing can be characterized by 3 styles of explanation. Researchers taking the first track have tried to motivate principles of structural preference from extralinguistic considerations like storage capacity in working memory, or bounds on complexity of incremental analysis. Frazier and Rayner's (1982) Minimal Attachment and Right Association principles, and Gorrell's simplicity metric, are examples of this type of theory. The second track eschews "parsing st rategies", replacing them with a fairly complex tuning by speaker/hearers to frequency in the hearer's linguistic environment. The difficulty of recovering an analysis of a construction in a particular case is a function of how often similar structures o r thematic role arrays appear in the language as a whole. The work of Trueswell et al (1994), Jurafsky (1996) and MacDonald et al (1994) are examples of frequency or probability based constraint satisfaction theories. The third track takes a more represe ntational view and ties processing principles to independently needed restrictions derived from competence and language learning. This approach claims that the natural language faculty is extremely well designed in the sense that the same set of principl es that govern language learning also contribute to a theory of sentence processing. This track is represented by the work of Gibson (1981), Gorrell (1995) Pritchett (1992), Philips (1995, 1996) and Weinberg (1992), who argue that processing can be seen as the rapid incremental satisfaction of grammatical constraints such as the Theta Criterion, which are needed independently to explain language learning or language variation. A variant of this approach, represented by Crain and Steedman (1985) among ot hers constrains the grammatical source for parsing principles but locates these principles within a discourse or semantic, rather than a syntactic component. This paper proposes a model of the last type. We argue that a particular version of the Minimalist Program (Chomsky (1993), Uriagereka (1998)) provides principles needed to explain both initial human preferences for ambiguous structures and provides a t heory of reanalysis, explaining when initial preferences can be revised given subsequent disconfirming data, and when they lead to unrevisable garden paths. We compare our model to other linguistically motivated theories such as Philips (1995, 1996), ar guing that Minimalist principles subsume the generalizations captured by Philip's theory in a more empirically adequate way. Finally, we argue that the data presented argue for this theory over those motivated by extralinguistic principles. Also cross-referenced as UMIACS-TR-98-53 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Linguist and the Laundromat. Amy Weinberg. October 1998.
This paper resulted from a roundtable discussion at the 1998 CUNY Sentence Processing Conference held at Rutgers University. Jerry Fodor (Philosphy, Rutgers University) an argued there that an adequate lexical semantics had to invoke a criterion of Rever se Compositionality. Fodor gives the following definition of 'Reverse Compositionality'(RC): "Nothing belongs to the lexical entry for a lexical item except what that item contributes to the grammatical representation of its hosts" where 'host is defin ed as "any expression E ...of which E is a constituent. " Moreover, Fodor claims that invoking this criterion has broad consequences for theories of language processing and acquisition, particularly with respect to theories that attribute processing beha vior to "lexical effects. Fodor claims that "...most of what cognitive science blithely refers to as lexical effects in parsing and language learning aren't in fact mediated by information of the kind that lexical entries contain...." and "... that language acquisition delivers sh allow lexical entries consonant with reverse compositionality, and that parsing delivers correspondingly shallow lexical entries consonant with assigning tokens to their types, and that everything else will turn out to be 'performance theory' ... In this paper, I argue that frequency and other standard lexical processing effects can form a legitimate part of a theory of sentence processing even if it adopts the criterion of "reverse compositionaliy". Cases drawn from the literature are used to s ketch what a theory adopting Fodor's criterion and using frequency and/or probabalistic information would look like. This commentary will appear in Proceedings of CUNY Conference on Sentence Processing, 1998, S. Stevenson and P. Merlo, eds, J. Benjami ns.. Also cross-referenced as UMIACS-TR-98-52 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Choosing Regularization Parameters in Iterative Methods for Ill-Posed. Misha E. Kilmer. Dianne P. O'Leary. October 1998.
Numerical solution of ill-posed problems is often accomplished by discretization (projection onto a finite dimensional subspace) followed by regularization. If the discrete problem has high dimension, though, typically we compute an approximate solution by projection onto an even smaller dimensional space, via iterative methods based on Krylov subspaces. In this work we present efficient algorithms that regularize after this second projection rather than before it. We prove some results on the approximate equivalence of this approach to other forms of regularization and we present numerical examples. (Also cross-referenced as UMIACS-TR-98-48) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Document Image Understanding - 1997. David Doermann. October 1998.
This report contains nearly 500 references which are directly related to the field of document image understanding and appeared in major journals and conferences during 1997. Each reference is classified by major topic. Areas covered include, but are not limited to, preprocessing, models and representations, on-line recognition, off-line recognition, graphics recognition and interpretation, page processing, post-processing and special applications. Department of Computer Science, University of Maryland,
Translating English and Mandarin Verbs with Argument Structure. Mari Broman Olsen. October 1998.
This paper applies and evaluates a semi-automatically acquired Mandarin Chinese lexicon (Olsen, Dorr, and Thomas 1998) with respect to translation of English and Chinese verbs in a UNESCO text (Otero 1997). I demonstrate how Lexical Conceptual Structure templates allow the same semantic structure to apply both to verbs with thematic roles incorporated in the verb itself, and those requiring external thematic complements. Using as examples the English verb _provide_, the Chinese counterpart ti2 gong2 (STC 2251 0180) and its English counterparts in the text, I show how potential translations are included or eliminated automatically based on their thematic role structure. The example illustrates (i) how an interlingual thematic representation based in large part on English argument structure may be adapted felicitously to a historically unrelated language, and (ii) how an interlingual (IL) resource developed for analysis may also be used in generation. (Also cross-refernced as UMIACS-TR-98-51) (Also cross-referenced as LAMP-TR-023) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Thematic Hierarchy for Efficient Generation from Lexical-Conceptual. Bonnie J. Dorr. Nizar Habash. David Traum. October 1998.
This paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an off-the-shelf generator (Nitrogen). There are two contributions of this work: (1) the development of a thematic hierarchy that provides ordering information for realization of arguments in their surface positions; (2) the provision of a diagnostic tool for detecting inconsistencies in an existing online LCS-based lexicon that allows us to enhance principles for thematic-role assignment. (Also cross-referenced as UMIACS-TR-98-50) (Also cross-refernced as LAMP-TR-022) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Lexical Selection for Cross-Language Applications: Combining LCS with. Bonnie J. Dorr. Maria Katsova. October 1998.
This paper describes experiments for testing the power of large-scale resources for lexical selection in machine translation (MT) and cross-language information retrieval (CLIR). We adopt the view that verbs with similar argument structure share certain meaning components, but that those meaning components are more relevant to argument realization than to idiosyncratic verb meaning. We verify this by demonstrating that verbs with similar argument structure as encoded in Lexical Conceptual Structure (LCS) are rarely synonymous in WordNet. We then use the results of this work to guide our implementation of an algorithm for cross-language selection of lexical items, exploiting the strengths of each resource: LCS for semantic structure and WordNet for semantic content. We use the Parka Knowledge-Based System to encode LCS representations and WordNet synonym sets and we implement our lexical-selection algorithm as Parka-based queries into a knowledge base containing both information types. (Also cross-referenced as UMIACS-TR-98-49) (Also cross-referenced as LAMP-TR-021) University of Maryland Institute for Advanced Computer Studies, Department of Computer, University of Maryland,
The Full Degree Spanning Tree Problem. Randeep Bhatia. Samir Khuller. Robert Pless. Yoram Sussmann. October 1998.
The full degree spanning tree problem is defined as follows: given a connected graph $G=(V,E)$ find a spanning tree $T$ so as to maximize the number of vertices whose degree in $T$ is the same as in $G$ (these are called vertices of ``full'' degree). We show that this problem is NP-hard. We also present almost {\em optimal} approximation algorithms for it assuming $coR \neq NP$. For the case of general graphs our approximation factor is $\Theta(\sqrt{n})$. Using H{\aa}stad's result on the hardness of approximating clique, we can show that if there is a polynomial time approximation algorithm for our problem with a factor of $O(n^{\frac{1}{2}-\epsilon})$ then $coR=NP$. For the case of planar graphs, we present a polynomial time approximation scheme. Additionally, we present some experimental results comparing our algorithm to the previous heuristic used for this problem. (Also cross-referenced as UMIACS 98-47) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Deferred Data-Flow Analysis : Algorithms, Proofs and Applications. Shamik D. Sharma. Anurag Acharya. Joel Saltz. September 1998.
Loss of precision due to the conservative nature of compile-time dataflow analysis is a general problem and impacts a wide variety of optimizations. We propose a limited form of runtime dataflow analysis, called deferred dataflow analysis (DDFA), which attempts to sharpen dataflow results by using control-flow information that is available at runtime. The overheads of runtime analysis are minimized by performing the bulk of the analysis at compile-time and deferring only a summarized version of the dataflow problem to runtime. Caching and reusing of dataflow results reduces these overheads further. DDFA is an interprocedural framework and can handle arbitrary control structures including multi-way forks, recursion, separately compiled functions and higher-order functions. It is primarily targeted towards optimization of heavy-weight operations such as communication calls, where one can expect significant benefits from sharper dataflow analysis. We outline how DDFA can be used to optimize different kinds of heavy-weight operations such as bulk-prefetching on distributed systems and dynamic linking in mobile programs. We prove that DDFA is safe and that it yields better dataflow information than strictly compile-time dataflow analysis. (Also cross-referenced as UMIACS-TR-98-46) Unoversity of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Comparison of the Memory Management sub-systems in FreeBSD and Linux. Rohit Dube. September 1998.
In this article we seek to compare the memory management sub-systems of two popular and freely available operating systems - FreeBSD and Linux. First a framework is developed, spelling out the components of a generic and modern memory management system. The framework is then used in a design level comparison of memory management in the two operating systems. (Also cross-referenced as UMIACS-TR-98-45) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Designing Practical Efficient Algorithms for Symmetric Multiprocessors. David R. Helman. Joseph JaJa. October 1998.
Symmetric multiprocessors (SMPs) dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing efficient algorithms for symmetric multiprocessors. We then use this model to create efficient solutions to two widely different types of problems - linked list prefix computations and generalized sorting. Our novel algorithm for prefix computations builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Our algorithm for generalized sorting is a modification of our algorithm for sorting by regular sampling on distributed memory architectures. The algorithm is a stable sort which appears to be asymptotically faster than any of the published algorithms for SMPs. Both of our algorithms were implemented in C using POSIX threads and run on three symmetric multiprocessors - the DEC AlphaServer, the Silicon Graphics Power Challenge, and the HP-Convex Exemplar. We ran our code for each algorithm using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. In spite of the fact that the processors must compete for access to main memory, both algorithms still yielded scalable performance up to 16 processors, which was the largest platform available to us. For some problems, our prefix computation algorithm actually matched or exceeded the performance of the best sequential solution using only a single thread. Similarly, our generalized sorting algorithm always beat the performance of sequential merge sort by at least an order of magnitude, even with a single thread. (Also cross-referenced as UMIACS-TR-98-44) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Meta Agent Programs. Juergen Dix. V.S.Subrahmanian. George Pick. September 1998.
There are numerous applications where one agent a needs to reason about the beliefs of another agent, as well as about the actions that other agents may take. Eiter et. al. introduced the concept of an agent program, and provided a language within which the operating principles of an agent could be declaratively encoded on top of imperative data structures. We first introduce certain belief data structures that an agent needs to maintain. Then we introduce the concept of a "Meta Agent Program" (MAP), that extends the Eiter et. al. framework, so as to allow agents to peform metareasoning. We build a formal semantics for MAPs, and show how this semantics supports not just beliefs agent a may have about agent b's state, but also beliefs about agents b's beliefs about agent c's actions, beliefs about b's beliefs about agent c's state, and so on. Finally, we provide a translation that takes any MAP as input and converts it into an agent program such that there is a one-one correspondence between the semantics of the MAP and the semantics of the resulting agent program. This correspondence allows an implementation of MAPs to be built on top of an implementation of agent programs. (Also cross-referenced as UMIACS-TR-98-43) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Analysis of a Packet-Pair Scheme for Estimating Bottleneck Bandwidth in a Network. Shikha Bahl. September 1998.
In order to assess the performance of a connection it is important to determine the bandwidth offered by the slowest node along the path, also known as bottleneck bandwidth. In past, many reasearchers have used a packet-pair technique in order to estimate the bottleneck bandwidth. In real networks, however, the measurements made using the packet-pair technique do not always reflect the correct estimate of the bottleneck service time due to the presence of cross traffic. While several reasons for the observed variability have been reported, the exact nature of the impact of cross traffic on the observations has not been studied In this paper we present a model to explain how the measured difference of the reception time for a packet-pair can be related to the characteristic of the service time and cross traffic that the pair found along the path. (Also cross-referenced as UMIACS-TR-98-42) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Parallel Strands: A Preliminary Investigation into Mining the Web for. Philip Resnik. August 1998.
Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention. (Also cross-referenced as UMIACS-TR-98-41) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Looking to Parallel Algorithms for ILP and Decentralization. Efraim Berkovich. Bruce L. Jacob. Joseph Nuzman. Uzi Vishkin. july 20, 1998.
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained SPMD-style programming; a SPMD program can translate directly to MIPS assembly language using three additional instruction primitives. The motivation for XMT is: (i) to define an inherently decentralizable architecture, taking into account that the performance of future integrated circuits will be dominated by wire costs, (ii) to increase available instruction-level parallelism (ILP) by leveraging expertise in the world of parallel algorithms, and (iii) to reduce hardware complexity by alleviating the need to detect ILP at run-time: if parallel algorithms can give us an overabundance of work to do in the form of thread-level parallelism, one can extract instruction-level parallelism with greatly simplified dependence-checking. We show that implementations of such an architecture tend towards decentralization and that, when global communication is necessary, overall performance is relatively insensitive to large on-chip delays. We compare the performance of the design to more traditional parallel architectures and to a high-performance superscalar implementation, but the intent is merely to illustrate the performance behavior of the organization and to stimulate debate on the viability of introducing SPMD to the single-chip processor domain. We cannot offer at this stage hard comparisons with well-researched models of execution. When programming for the SPMD model, the total number of operations that the processor has to perform is often slightly higher. To counter this, we have observed that the length of the critical path through the dynamic execution graph is smaller than in the serial domain, and the amount of ILP is correspondingly larger. Fine-grained SPMD programming connects with a broad knowledge base in parallel algorithms and scales down to provide good performance relative to high-performance superscalar designs even with small input sizes and small numbers of functional units. Keywords: Fine-grained SPMD, parallel algorithms. spawn-join, prefix-sum, instruction-level parallelism, decentralized architecture. (Also cross-referenced as UMIACS-TR- 98-40) University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Performance Prediction Framework for Data Intensive Applications on. Mustafa Uysal. Tahsin M. Kurc. Alan Sussman. Joel Saltz. July 1998.
This paper presents a simulation-based performance prediction framework for large scale data-intensive applications on large scale machines. Our framework consists of two components: application emulators and a suite of simulators. Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering, processing structure, etc.) easily and flexibly. Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine configurations. The key to efficient simulation of very large scale configurations is a technique called loosely-coupled simulation where the processing structure of the application is embedded in the simulator, while preserving data dependencies and data distributions. We evaluate our performance prediction tool using a set of three data-intensive applications. (Also cross-referenced as UMIACS TR # 98-39) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Prefix Computations on Symmetric Multiprocessors. David R. Helman. Joseph JaJa. July 1998.
We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas Reid-Miller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor architecture (SMP). These symmetric multiprocessors dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Our prefix computation algorithm was implemented in C using POSIX threads and run on three symmetric multiprocessors - the DEC AlphaServer, the SGI Power Challenge, and the HP-Convex Exemplar. We ran our code using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. For some problems, our algorithm actually matched or exceeded the optimal sequential solution using only a single thread. Moreover, in spite of the fact that the processors must compete for access to main memory, our algorithm still resulted in scalable performance up to 16 processors, which was the largest platform available to us. (Also cross-referenced as UMIACS-98-38) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Characterization of the General Protocol Conformance Test Sequence. Raymond Miller. Junehwa Song. June 1998.
No abstract submitted. Department of Computer Science, University of Maryland,
Mobile Streams. M. Ranganathan. Anurag Acharya. Laurent Andrey. Virginie Schaal. Joel Saltz. June 1998.
A large class of distributed testing, control and collaborative applications are reactive or event driven in nature. Such applications can be structured as a set of handlers that react to events and that in turn can trigger other events. We have developed an application building toolkit that facilitates development of such applications. Our system is based on the concept of Mobile Streams. Applications developed in our system are dynamically extensible and re-configurable and our system provides the application designer a means to control how the system can be extended and reconfigured. We describe our system model and implementation and compare our design to the design of other systems. (Also cross-referenced as UMIACS-TR-98-36) University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
Enhancing Automatic Acquisition of Thematic Structure in a Large-Scale. Mari Broman Olsen. Bonnie Dorr. Scott Thomas. June 1998.
This paper describes a refinement to our procedure for porting lexical conceptual structure into new languages. Specifically we describe a two-step process for creating candidate thematic grids for Mandarin Chinese verbs, using the English verb heading the VP in the subdefinitions to separate senses, and roughly parsing the verb complement structure to match to our thematic structure templates. The procedure is part of a larger process of creating a usable lexicon for interlingual machine translation from a large on-line resource with both too much and too little information necessary for our system. (Also cross-referenced as UMIACS-TR-98-35) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Graphical Multiscale Web Histories: A Study of PadPrints. Ron R. Hightower. Laura T. Ring. Jonathan I. Helfman,. Benjamin B. Bederson. James D. Hollan. May 1998.
We have implemented a browser companion called PadPrints that dynamically builds a graphical history-map of visited web pages. PadPrints relies on Pad++, a zooming user interface (ZUI) development substrate, to display the history-map using minimal screen space. PadPrints functions in conjunction with a traditional web browser but without requiring any browser modifications. We performed two usability studies of PadPrints. The first addressed general navigation effectiveness. The second focused on history-related aspects of navigation. In tasks requiring returns to prior pages, users of PadPrints completed tasks in 61.2% of the time required by users of the same browser without PadPrints. We also observed significant decreases in the number of pages accessed when using PadPrints. Users found browsing with PadPrints more satisfying than using Netscape alone. (Also cross-referenced as UMIACS-TR-98-33) University of Marylamd Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Application Framework for Creating Simulation-Based Learning. Anne Rose. David Eckard. Gary W. Rubloff+. May 1998.
While there are numerous types of electronic learning environments including collaboratories, construction toolkits, systems with "scaffolding" and simulations, it is difficult to find authoring tools to build these systems. We have developed an application framework for constructing simulation-based learning environments called SimPLE (Simulated Processes in a Learning Environment). Environments developed with SimPLE use dynamic simulations and visualizations to represent realistic time-dependent behavior and are coupled with guidance material and other software aids that facilitate learning. The software architecture enables independent contributions from developers representing educational content (e.g., simulation models, guidance materials) and software development (e.g., user interface). We provide a user interface template and accompanying software aids to reduce the software development effort. (Also cross-referenced as UMIACS-TR-98-32) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn. Gayathri Chittiappa. A. Udaya Shankar. May 1998.
We develop a numerical evaluation method for adaptive hierarchical QoS routing, and demonstrate its viability by application to two networks. Our approach models aggregation and delayed feedback in a straightforward way, and is scalable to the large networks needed to evaluate hierarchical routing. Department of Computer Science, University of Maryland,
Evaluation of Tradeoffs in Resource Management Techniques for. Leana Golubchik. John C. S. Liu. Edmundo de Silva e Souza. H. Richard Gail. May 1998.
Many modern applications can benefit from sharing of resources such as network bandwidth, disk bandwidth, and so on. In addition, many information systems store (or would like to store) data that can be of use to many different classes of applications, e.g., digital libraries type systems. Part of the difficulty in efficient resource management of such systems can then occur when these applications have vastly different performance and quality-of-service (QoS) requirements as well as resource demand characteristics. In this work we present a performance study of a multimedia storage system which serves multiple types of workloads, specifically a mixture of real-time and non-real-time workloads, by allowing sharing of resources among these different workloads while satisfying their performance requirements and QoS constraints. The broad aim of this work is to examine the issues and tradeoffs associated with mixing multiple workloads on the same server to explore the possibility of maintaining reasonable performance and QoS requirements without having to partition the resources. The main contribution of this work is the exposition of the tradeoffs involved in resource management in such systems. Although many different resources can be considered, here we concentrate mostly on the I/O bandwidth resource. The performance metrics of interest are the mean and variance of the response time for the non-real-time applications and the probability of missing a deadline for the real-time applications. The increased use of buffer space resources is also considered as a tradeoff for improvements in the above stated performance metrics, i.e., response time and probability of missing deadlines. (Also cross-referenced as UMIACS-TR-98-30) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Fast Evaluation of Ensemble Transients of Large IP Networks. Catalin T. Popescu. A. Udaya Shankar. May 11, 1998.
We extend a numerical approximate solution method (the Z-iteration) to time-dependent open networks of M(t)/M(t)/1/$\infty$ and M(t)/M(t)/1/K queues, and apply the method to obtain transient performance metrics of large IP networks. The method generates a set of coupled differential equations, one for each queue in the network. The equations are numerically unstable under certain conditions (e.g., large bandwidths and buffers), and we present techniques to overcome this problem. The resulting numerical procedure is accurate and very fast. For example, a 20-second evolution for a 1000-node network with high-speed links ($\approx 10^4$packets/sec) and large buffers ($\approx 10^4$packets) was obtained in 18 minutes on an Ultra Sparc, whereas simulation would take days. Department of Computer Science, University of Maryland,
Data Object and Label Placement For Information Abundant Visualizations. Jia Li. Catherine Plaisant. Ben Shneiderman. August 1998.
Placing numerous data objects and their corresponding labels in limited screen space is a challenging problem in information visualization systems. Extending map-oriented techniques, this paper describes static placement algorithms and develops metrics (such as compactness and labeling rate) as a basis for comparison among these algorithms. A control panel facilitates user customization by showing the metrics for alternative algorithms. Dynamic placement techniques that go beyond map-oriented techniques demonstrate additional possibilities. User actions can lead to selective display of data objects and their labels. Department of Computer Science, University of Maryland,
A Comparative Study of Knowledge-Based Approaches for Cross-Language. Douglas W. Oard. Bonnie J. Dorr. Paul G. Hackett. Maria Katsova. April 1998.
Cross-language retrieval systems seek to use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, bilingual dictionaries and translation lexicons, for cross-language retrieval. We have implemented six query translation techniques that use bilingual dictionaries, one based on lexical-semantic analysis, and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better effectiveness than simple dictionary-based techniques, and that performing document translation rather than query translation may result in further improvements in retrieval effectiveness under some conditions. (Also cross-referenced as UMIACS-TR-98-27) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Investigating Reading Techniques for Framework Learning. Forrest Shull. Filippo Lanubile. Victor R. Basili. April 1998.
The empirical study described in this paper addresses software reading for construction: how application developers obtain an understanding of a software artifact for use in new system development. This study focuses on the processes developers would engage in when learning and using object-oriented frameworks. We analyzed 15 student software development projects using both qualitative and quantitative methods to gain insight into what processes occurred during framework usage. The contribution of the study is not to test predefined hypotheses but to generate well-supported hypotheses for further investigation. The main hypotheses we produce are that example-based techniques are well suited to use by beginning learners while hierarchy-based techniques are not because of a larger learning curve. Other more specific hypotheses are proposed and discussed. (Also cross-referenced as UMIACS-TR-98-26) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Infrastructure for Building Parallel Database Systems for. Chialin Chang. Alan Sussman. Joel Saltz. April 1998.
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. We also present some preliminary performance results comparing the implementation of a remote-sensing image database using the T2 services with a custom-built integrated implementation. (Also cross-referenced as UMIACS-TR-98-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Digital Dynamic Telepathology -- the Virtual Microscope. Asmara Afework,. Michael D. Beynon,. Fabian Bustamante,. Angelo Demarzo, M.D.,. Renato Ferreira,. Robert Miller, M.D.,. Mark Silberman, M.D.,. Joel Saltz, M.D., Ph.D.,. Alan Sussman, Ph.D.,. Hubert Tsang. March 1998.
The Virtual Microscope is being designed as an integrated computer hardware and software system that generates a highly realistic digital simulation of analog, mechanical light microscopy. We present our work over the past year in meeting the challenges in building such a system. The enhancements we made are discussed, as well as the planned future improvements. Performance results are provided that show that the system scales well, so that many clients can be adequately serviced by an appropriately configured data server. (Also cross-referenced as UMIACS-TR-98-23) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Eigenanalysis of Some Preconditioned Helmholtz Problems. Howard C. Elman. Dianne P. O'Leary. March 1998.
In this work we calculate the eigenvalues obtained by preconditioning the discrete Helmholtz operator with Sommerfeld-like boundary conditions on a rectilinear domain, by a related operator with boundary conditions that permit the use of fast solvers. The main innovation is that the eigenvalues for two and three-dimensional domains can be calculated exactly by solving a set of one-dimensional eigenvalue problems. This permits analysis of quite large problems. For grids fine enough to resolve the solution for a given wave number, preconditioning using Neumann boundary conditions yields eigenvalues that are uniformly bounded, located in the first quadrant, and outside the unit circle. In contrast, Dirichlet boundary conditions yield eigenvalues that approach zero as the product of wave number with the mesh size is decreased. These eigenvalue properties yield the first insight into the behavior of iterative methods such as GMRES applied to these preconditioned problems. (Also cross-referenced as UMIACS-TR-98-22) University of Maryland Institute for Adavcanced Computer Studies, Department of Computer Science, University of Maryland,
Emergent Patterns of Teaching/Learning in Electronic Classrooms. Ben Shneiderman. Ellen Yu Borkowski. Maryam Alavi. Kent Norman. July 1998.
Novel patterns of teaching/learning have emerged from faculty and students who use our three Teaching/Learning Theaters at the University of Maryland, College Park. These fully-equipped electronic classrooms have been used by 74 faculty in 264 semester-long courses since the Fall of 1991 with largely enthusiastic reception by both faculty and students. The designers of the Teaching/Learning Theaters sought to provide a technologically rich environment and a support staff so that faculty could concentrate on changing the traditional lecture from its unidirectional information flow to a more collaborative activity. As faculty evolved their personal styles in using the electronic classrooms, novel patterns of teaching/learning have emerged. In addition to enhanced lectures, we identified three common patterns: active individual learning, small-group collaborative learning, and entire-class collaborative learning. Department of Computer Science, University of Maryland,
Chapter 3: Children as Our Technology Design Partners+. Allison Druin. Ben Bederson. Angela Boltman. Adrian Miura. Debby Knotts-Callahan. Mark Platt. March 1998.
"That's silly!" "I'm bored!" "I like that!" "Why do I have to do this?" "What is this for?" These are all important responses and questions that come from children. As our design partners in developing new technologies, children can offer bluntly honest views of their world. They have their own likes, dislikes, and needs that are not the same as adults' (Druin, Stewart, Proft, Bederson, & Hollan, 1997). As the development of new technologies for children becomes commonplace in industry and university research labs, children's input into the design and development process is critical. We need to establish new development methodologies that enable us to stop and listen, and learn to collaborate with children of all ages. In the chapter that follows, a discussion of new research methodologies will be presented. (Also cross-referenced as UMIACS-TR-98-20) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Building Self-Reconfiguring Distributed Virtual Environments. Donald J. Welch. March 2998.
A distributed virtual environment may be required to reconfigure itself to compensate for various conditions that can occur during execution. An example is the reentry of a virtual environment that was previously reconfigured out of the distributed virtual environment due to failure. If there is a human user of this virtual environment, care must be taken to insure that he is brought back into the distributed virtual environment in a way that makes sense. He cannot regain control of a tank that is out of ammunition while a computer-based simulation controls actively participating tanks. The compensating reconfiguration function of a distributed virtual environment must detect conditions that dictate reconfiguration. It must determine the proper course of action and act on it, bringing the distributed virtual environment to a stable state as quickly as possible. Proper reconfiguration of a distributed virtual environment requires that the compensating reconfiguration software know the system configuration, the virtual state, and the mapping between them. Building compensating reconfiguration software using traditional means is laborious and error prone. A rule-based tool that uses abstract views of the distributed virtual environment is a better way to produce compensating reconfiguration software. To show the viability of this approach I have developed a rule-based tool called Bullpen. This research compares Bullpen against manual coding in a case study that ranges over a wide array of requirements changes. The results of this case study show that using Bullpen to build compensating reconfiguration components is superior to manually building the software in the kind of environments most commonly found in the military DVE domain. Using Bullpen takes less effort and is less complex than using manual programming techniques. The resulting component is less error prone and has acceptable reaction time. (Also cross-referenced as UMIACS-TR-98-18) University of Maryland Institute for Advanced Computer Studies), Department of Computer Science, University of Maryland,
Parametric Design Synthesis of Distributed Embedded Systems. Dong-In Kang. Richard Gerber. Manas Saksena. 3/12/98.
This paper presents a design synthesis method for distributed embedded systems. In such systems, computations can flow through long pipelines of interacting software components, hosted on a variety of resources, each of which is managed by a local scheduler. Our method automatically calibrates the local resource schedulers to achieve the system's global end-to-end performance requirements. A system is modeled as a set of distributed task chains (or pipelines), where each task represents an activity requiring nonzero load from some CPU or network resource. Task load requirements can vary stochastically, due to second-order effects like cache memory behavior, DMA interference, pipeline stalls, bus arbitration delays, transient head-of-line blocking, etc. We aggregate these effects -- along with a task's per-service load demand -- and model them via a single random variable, ranging over an arbitrary discrete probability distribution. Load models can be obtained via profiling tasks in isolation, or simply by using an engineer's hypothesis about the system's projected behavior. The end-to-end performance requirements are posited in terms of throughput and delay constraints. Specifically, a pipeline's delay constraint is an upper bound on the total latency a computatation can accumulate, from input to output. The corresponding throughput constraint mandates the pipeline's minimum acceptable output rate -- counting only outputs which meet their delay constraints. Since per-component loads can be generally distributed, and since resources host stages from multiple pipelines, meeting all of the system's end-to-end constraints is a nontrivial problem. Our approach involves solving two sub-problems in tandem: (A)~finding an optimal proportion of load to allocate each task and channel; and (B)~deriving the best combination of service intervals over which all load proportions can be guaranteed. The design algorithms use analytic approximations to quickly estimate output rates and propagation delays for candidate solutions. When all parameters are synthesized, the estimated end-to-end performance metrics are re-checked by simulation. The per-component load reservations can then be increased, with the synthesis algorithms re-run to improve performance. At that point the system can be configured according to the synthesized scheduling parameters -- and then re-validated via on-line profiling. In this paper we demonstrate our technique on an example system, and compare the estimated performance to its simulated on-line behavior. (Also cross-referenced as UMIACS-TR-98-18) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland Institute,
Hybrid Probabilistic Programs. Alex Dekhtyar. V. S. Subrahmanian. March 1998.
The precise probability of a compound event (e.g. e1 v e2, e1 ^ e2) depends upon the known relationships (e.g. independence, mutual exclusion, ignorance of any relationship, etc.) between the primitive events that constitute the compound event. To date, most research on probabilistic logic programming [20, 19, 22, 23, 24] has assumed that we are ignorant of the relationship between primitive events. Likewise, most research in AI (e.g. Bayesian approaches) have assumed that primitive events are independent. In this paper, we propose a hybrid probabilistic logic programming language in which the user can explicitly associate, with any given probabilistic strategy, a conjunction and disjunction operator, and then write programs using these operators. We describe the syntax of hybrid probabilistic programs, and develop a model theory and fixpoint theory for such programs. Last, but not least, we develop three alternative procedures to answer queries, each of which is guaranteed to be sound and complete. (Also cross-referenced as UMIACS-TR-98-16) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Heterogeneous Active Agents. V.S. Subrahmanian. Thomas Eiter. George Pick. March 1998.
Over the years, many different agent programming languages have been proposed. In this paper, we propose a concept called Agent Programs using which, the way an agent should act in various situations can be declaratively specified by the creator of that agent. Agent Programs may be built on top of arbitrary pieces of software code and may be used to specify what an agent is obliged to do, what an agent may do, and what an agent may not do. In this paper, we define several successively more sophisticated and epistemically satisfying declarative semantics for agent programs, and study the computation price to be paid (in terms of complexity) for such epistemic desiderata. We further show that agent programs cleanly extend well understood semantics for logic programs, and thus are clearly linked to existing results on logic programming and nonmonotonic reasoning. Last, but not least, we have built a simulation of a Supply Chain application in terms of our theory, building on top of commercial software systems such as Microsoft Access and ESRI's Map Object. (Also cross-referenced as UMIACS-TR-98-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Facilitating Network Data Exploration with Query Previews: A Study of. Egemen Tanin. Amnon Lotem. Ihab Haddadin. Ben Shneiderman. Catherine Plaisant. Laura Slaughter. February 1998.
Current network data exploration systems which use command languages (e.g. SQL) or form fill-in interfaces fail to give users an indication of the distribution of data items. This leads many users to waste time posing queries which have zero-hit or mega-hit result sets. Query previewing is a novel visual approach for browsing huge networked information warehouses. Query previews supply data distribution information about the database that is being searched and give continuous feedback about the size of the result set for the query as it is being formed. Our within-subjects empirical comparison studied 12 subjects using a form fill-in interface with and without query previews. We found statistically significant differences showing that query previews sped up performance 1.6 to 2.1 times and led to higher subjective satisfaction. (Also cross-referenced as UMIACS-98-14) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Reduction of Materialized View Staleness Using Online Updates. Alexandros Labrinidis. Nick Roussopoulos. February 1998.
Updating the materialized views stored in data warehouses usually implies making the warehouse unavailable to users. We propose MAUVE, a new algorithm for online incremental view updates that uses timestamps and allows consistent read-only access to the warehouse while it being updated. The algorithm propagates the updates to the views more often than the typical once a day in order to reduce view staleness. We have implemented MAUVE top of the Informix Universal Server and used a synthetic workload generator to experiment with various update workloads and different view update frequencies. Our results show that, all kinds of update streams benefit from more frequent view updates, instead of just once a day. However, there is a clear maximum for the view update frequency, for which view staleness is minimal. (Also cross-referenced as UMIACS-TR-98-13) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Two Algorithms for the The Efficient Computation of Truncated Pivoted. G. W. Stewart. February 1998.
In this note we propose two algorithms to compute truncated pivoted QR approximations to a sparse matrix. One is based on the Gram--Schmidt algorithm, and the other on Householder triangularization. Both algorithms leave the original matrix unchanged, and the only additional storage requirements are arrays to contain the factorization itself. Thus, the algorithms are particularly suited to determining low-rank approximations to a sparse matrix. (Also cross-referenced as UMIACS-TR-98-12) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Analysis and Applications of Receptive Safety Properties in Concurrent. Gilberto Matos. February 1998.
Formal verification for complex concurrent systesm is a computationally intensive and in some cases, intractable process. The compexity is an inherent part of the verification process due to the system complexity that is an exponential function of the sizes of its components. However, some properties can be enforced by atuomatically synchronizing the components, thus eliminating the need for verfication. Moreover, the complexity of the analysis required to enforce the properties grows incrementally with addition of new components and properties that make the system complexity grow exponentially. The properties in question are the receptive safety properties, a subset of safety properties that can only be violated by component actions. The receptive safety properties represent the realizable subset of the gerneral safety properties because a system that satisfies any non-receptive safety properties mst satisfy related receptive safety properties. This implies that any system with realizable safety requirements can be described as a set of components and receptive safety properties that specify the component interaction that satisfies the requirements. We have developed a methos that automaticaly synchronizes complex concurrent systems to enforce their receptive safety propeties. Many non-safety properties, and automated synchronization can be used to enforce them. (Also cross-referenced as UMIACS-TR-98-11) University of Maryland Institute for Advanced Computer Studies, Departmen tof Computer Science, University of Maryland,
Cyclone Technology: An Overview. Sung Lee. February 1998.
The current network, which is based on managing resources on demand and accepting uncontrolled communication request, often leads to problems such as congestion and other queueing bottlenecks. The extent of congestion and queues depends on the variability in customer arrival times, services needed, and the resource allocation mechanism used by system components. The queue sizes, which results in congestion, can be reduced only by controlling the variability in customer arrival times, and this is best done by making explicit use of time. Cyclone technology uses the information based on times of events explicitly, including the design of systems. Cyclone provides the coordination of resources through dynamic, time-based resource management leading to a network that is capable of providing end-to-end low latency communications free of losses, jitter, (Also cross-referenced as UMIACS-TR-98-10) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Interfaces and Tools for the Library of Congress National Digital. Gary Marchionini. Catherine Plaisant. Anita Komlodi. February 1998.
This paper describes a collaborative effort to explore user needs in a digital library, develop interface prototypes for a digital library, and suggest and prototype tools for digital librarians and users at the Library of Congress (LC). Interfaces were guided by an assessment of user needs and aimed to maximize interaction with primary resources and support both browsing and analytical search strategies. Tools to aid users and librarians in overviewing collections, previewing objects, and gatherin g results were created and serve as the beginnings of a digital librarian toolkit. The design process and results are described and suggestions for future work are offered. (Also cross-referenced as UMIACS-TR-98-09) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Composite Model Checking with Type Specific Symbolic Encodings. Tevfik Bultan. Richard Gerber. February 1998.
We present a new symbolic model checking technique, which analyzes temporal properties in multi-typed transition systems. Specifically, the method uses multiple type-specific data encodings to represent system states, and it carries out fixpoint computations via the corresponding type-specific symbolic operations. In essence, different symbolic encodings are unified into one composite model checker. Any type-specific language can be included in this framework -- provided that the language is closed under Boolean connectives, propositions can be checked for satisfiability, and relational images can be computed. Our technique relies on conjunctive partitioning of transition relations of atomic events based on variable types involved, which allows independent computation of one-step pre- and post-conditions for each variable type. In this paper we demonstrate the effectiveness of our method on a nontrivial data-transfer protocol, which contains a mixture of integer and Boolean-valued variables. The protocol operates over an unreliable channel that can lose, duplicate or reorder messages. Moreover, the protocol's send and receive window sizes are not specified in advance; rather, they are represented as symbolic constants. The resulting system was automatically verified using our composite model checking approach, in concert with a conservative approximation technique. (Also cross-referenced as UMIACS-TR-98-07) University of Maryland Institure for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Model Checking Concurrent Systems with Unbounded Integer Variables:. Tevfik Bultan. Richard Gerber. William Pugh. February 1998.
Model checking is a powerful technique for analyzing large, finite-state systems. In an infinite-state system, however, many basic properties are undecidable. In this paper, we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically encode a program's transition system, as well as its model-checking computations. All fixpoint calculations are executed symbolically, and their convergence is guaranteed by using approximation techniques. We demonstrate the promise of this technology on some well-known infinite-state concurrency problems. (Also cross-referenced as UMIACS-TR-98-07) University of Maryland Institure for Advanced Computer Studies, Department of Computer Science, University of Maryland,
T2: A Customizable Parallel Database For Multi-dimensional Data. Chialin Chang. Anurag Acharya. Alan Sussman. Joel Saltz. January 1998.
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. (Also cross-referenced as UMIACS-TR-98-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Adjoint Matrix. G. W. Stewart. January 1998.
The adjoint $A\adj$ of a matrix $A$ is the transpose of the matrix of the cofactors of the elements of $A$. The computation of the adjoint from its definition involves the computation of $n^{2}$ determinants of order $(n-1)$\,---\,a prohibitively expensive $O(n^{4})$ process. On the other had the computation from the formula $A\adj = \det(A)A\inv$ breaks down when $A$ is singular and is potentially unstable when $A$ is ill-conditioned. In this paper we first show that the ajdoint can be perfectly conditioned, even when $A$ is ill-conditioned. We then show that if due care is taken the adjoint can be accurately computed from the inverse, even when the latter has been inaccurately computed. In an appendix to this paper we establish a folk result on the accuracy of computed inverses. \end{minipage} \end{center} Also cross-referenced as UMIACS-TR-98-02 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Codex, Memex, Genex: The pursuit of transformational technologies. Ben Shneiderman. December 1997.
Handwritten codexes or printed books transformed society by allowing users to preserve and transmit information. Today, leather-bound volumes and illuminated manuscripts are giving way to animated image maps and hot links. Vannevar Bush's memex has inspired the World Wide Web, which provides users with vast information resources and convenient communications. In looking to the future, we might again transform society by building genexes -- generators of excellence. Such inspirational environments would empower personal and collaborative creativity by enabling users to: - collect information from an existing domain of knowledge, - create innovations using advanced tools, - consult with peers or mentors in the field, and then - disseminate the results widely. This paper describes how a framework for an integrated set of software tools might support this four-phase model of creativity in science, medicine, the arts, and beyond. Current initiatives are positive and encouraging, but they do not work in an integrated fashion, often miss vital components, and are frequently poorly designed. A well-conceived and clearly-stated framework could guide design efforts, coordinate planning, and speed development. (Also cross-referenced as UMIACS-TR-97-89) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On-Demand Broadcast Scheduling. Demet Aksoy. Michael Franklin. December 1998.
Broadcast is becoming an increasingly attractive data dissemination method for large client populations. In order to effectively utilize a broadcast medium for such a service, it is necessary to have efficient, on-line scheduling algorithms that can balance individual and overall performance, and can scale in terms of data set sizes, client populations, and broadcast bandwidth. We propose an algorithm, called RxW, that provides good performance across all of these criteria and that can be tuned to trade off average and worst case waiting time. Unlike previous work on low overhead scheduling, the algorithm does not use estimates of the access probabilities of items, but rather, it makes scheduling decisions based on the current queue state, allowing it to easily adapt to changes in the intensity and distribution of the workload. We demonstrate the performance advantages of the algorithm under a range of scenarios using a simulation model and present analytical results that describe the intrinsic behavior of the algorithm. (Also cross-referenced as UMIACS-TR-98-88) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Toward Compact Monotonically Compositional Interlingua Using Lexical Aspect. Bonnie J. Dorr. Mari Broman Olsen. Scott C. Thomas. December 1997.
We describe a theoretical investigation into the semantic space described by our interlingua (IL), which currently has 191 main verb classes divided into 434 subclasses, represented by 237 distinct Lexical Conceptual Structures (LCSs). Using the model of aspect in Olsen (1994b, 1997a)---monotonic aspectual composition---we have identified 71 aspectually basic subclasses that are associated with one or more of 68 aspectually non-basic classes via some lexical (``type-shifting'') rule (Bresnan 1982, Pinker 1984, Levin and Rappaport Hovav 1995). This allows us to refine the IL and address certain computational and theoretical issues at the same time. (1) >From a linguistic viewpoint, the expected benefits include a refinement of the aspectual model in (Olsen:1994b, 1997a) (which provides necessary but not sufficient conditions for aspectual composition), and a refinement of the verb classifications in (Levin 1993); we also expect our approach to eventually produce a systematic definition (in terms of LCSs and compositional operations) of the precise meaning components responsible for Levin's classification. (2) Computationally, the lexicon is made more compact. Also cross-referenced as UMIACS-TR-97-86 Also cross-referenced as LAMP-TR-012 University of Maryland Institute for Advanced Computer Studies, University of Maryland Laboratory for Language and Media Processing, Department of Computer Science, University of Maryland,
Using WordNet to Posit Hierarchical Structure in Levin's Verb Classes. Mari Broman Olsen. Bonnie J. Dorr. David J. Clark. December 1997.
In this paper we report on experiments using WordNet synset tags to evaluate the semantic properties of the verb classes cataloged by Levin 1993. This paper represents ongoing research begun at the University of Pennsylvania (Rosenzweig et al. 1997, Palmer et al. 1997) and the University of Maryland (Dorr and Jones 1996b, 1996d, 1996e). Using WordNet sense tags to constrain the intersection of Levin classes, we avoid spurious class intersections introduced by homonymy and polysemy (_run a bath, run a mile_). By adding class intersections based on a single shared sense-tagged word, we minimize the impact of the non-exhaustiveness of Levin's database (Dorr and Olsen 1996, Dorr to appear). By examining the syntactic properties of the intersective classes, we provide a clearer picture of the relationship between WordNet/EuroWordNet and the LCS interlingua for machine translation and other NLP applications. Also cross-referenced as UMIACS-TR-97-85 Also cross-referenced as LAMP-TR-011 University of Maryland Institute for Advanced Computer Studies, University of Maryland Laboratory for Language and Media Processing, Department of Computer Science, University of Maryland,
The End of Zero-Hit Queries: Query Previews for NASA's Global Change. Stephan Greene. Egemen Tanin. Catherine Plaisant. Ben Schneiderman. Lola Olsen. Gene Major. Steve Johns. December 1997.
The Human-Computer Interaction Laboratory (HCIL) of the University of Maryland and NASA have collaborated over the last three years to refine and apply user interface research concepts developed at HCIL in order to improve the usability of NASA data services. The research focused on dynamic query user interfaces, visualization, and overview +preview designs. An operational prototype, using query previews, was implemented with NASA's Global Change Master Directory (GCMD), a directory service for earth science data sets. Users can see the histogram of the data distribution over several attributes and choose among attribute values. A result bar shows the cardinality of the result set, thereby preventing users from submitting queries that would have zero hits. Our experience confirmed the importance of metadata accuracy and completeness. The query preview interfaces make visible problems or holes in the metadata that are unnoticeable with classic form fill-in interfaces. This could be seen as a problem, but we think that it will have a long-term beneficial effect on the quality of the metadata as data providers will be compelled to produce more complete and accurate metadata. The adaptation of the research prototype to the NASA data required revised data structures and algorithms. (Also cross-referenced as UMIACS-TR-97-84) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Information Architecture to Support the Visualization of Personal. Catherine Plaisant. Ben Schneiderman. December 1997.
Also cross-referenced as UMIACS-TR-97-87 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Taxonomy of Multiple Window Coordinations. Chris North. Ben Schneiderman. December 1997.
Handwritten codexes or printed books transformed society by allowing users to preserve and transmit information. Today, leather-bound volumes and illuminated manuscripts are giving way to animated image maps and hot links. Vannevar Bush's memex has inspired the World Wide Web, which provides users with vast information resources and convenient communications. In looking to the future, we might again transform society by building genexes -- generators of excellence. Such inspirational environments would empower personal and collaborative creativity by enabling users to: collect information from an existing domain of knowledge, create innovations using advanced tools, consult with peers or mentors in the field, and then disseminate the results widely. This paper describes how a framework for an integrated set of software tools might support this four-phase model of creativity in science, medicine, the arts, and beyond. Current initiatives are positive and encouraging, but they do not work in an integrated fashion, often miss vital components, and are frequently poorly designed. A well-conceived and clearly-stated framework could guide design efforts, coordinate planning, and speed development. (Also cross-referenced as UMIACS-TR-97-83) University of Maryland Institute for Advanced Computer Studies, University of Maryland Institute for Systems Research, Department of Computer Science, University of Maryland,
An Approach to Improve Existing Measurement Frameworks in Software. Manoel Gomes Mendonca. December 1997.
Measurement is a key mechanism to characterize, evaluate, and improve software development, management, and maintenance processes. Nowadays, software organizations use metrics for very different purposes. Data is collected to describe, monitor, understand, assess, compare, validate, and appraise very diverse attributes related to software processes or products. Improving data collection and better using the existing data are important problems for software organizations. This dissertation proposes an approach for improving measurement and data use when a large number of diverse metrics are already being collected by a software organization. The approach combines two methods. One looks at an organization's measurement framework in a top-down fashion and the other looks at it in a bottom-up fashion. The top-down method, based on the Goal-Question-Metric (GQM) Paradigm, is used to identify the measurement goals of data users and map them to the metrics being used by the organization. This allows the measurement practitioners to: (1)~identify which metrics are and are not useful to the organization; and (2)~check if the goals of data user groups can be satisfied by the data that is being collected by the organization. The bottom-up method is based on a data mining technique called Attribute Focusing (AF). It is used to identify useful information in the existing data that the data users were not aware of. To validate the approach and to assess its usefulness, a case study was performed in a real industrial environment. The top-down and bottom-up methods were applied in the customer satisfaction measurement framework at the IBM Toronto Laboratory. The top-down method was applied to improve the customer satisfaction (CUSTSAT) measurement from the point of view of three data user groups. The bottom-up method was used to gain new insights into the existing CUSTSAT data. The top-down method identified several new metrics for the interviewed user groups. It also contributed to better understanding the data user needs and led to modification of some of the data analyses and presentations done for those groups. The bottom-up method produced important insights on both the customer satisfaction domain and the measurement framework itself. Unexpected associations between key variables prompted new insights on their importance for the organization. Some of these associations have also revealed problems with the metrics being used to collect the data. (Also cross-referenced as UMIACS-TR-97-82) University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
Symmetric Cauchy-like Preconditioners for the Regularized Solution of. Misha E. Kilmer. December 1997.
The discretization of integral equations can lead to systems involving symmetric Toeplitz matrices. We describe a preconditioning technique for the regularized solution of the related discrete ill-posed problem. We use discrete sine transforms to transform the system to one involving a Cauchy-like matrix. Based on the approach of Kilmer and O'Leary, the preconditioner is a symmetric, rank $m^{*}$ approximation to the Cauchy-like matrix augmented by the identity. We shall show that if the kernel of the integral equation is smooth then the preconditioned matrix has two desirable properties; namely, the largest $m^{*}$ magnitude eigenvalues are clustered around and bounded below by one, and that small magnitude eigenvalues remain small. We also show that the initialization cost is less than the initialization cost for the preconditioner introduced by Kilmer and O'Leary. Further, we describe a method for applying the preconditioner in $O((n+1) \lg (n+1))$ operations when $n+1$ is a power of 2, and describe a variant of the MINRES algorithm to solve the symmetrically preconditioned problem. The preconditioned method is tested on two examples. Department of Computer Science, University of Maryland, Applied mathematics Program, University of Maryland,
Dynamic Time-Based Scheduling for Hard Real-Time Systems. Seonho Choi. December 1997.
In traditional time-based scheduling schemes for real-time systems time line is explicitly managed to obtain a feasible schedule that satisfies all timing constraints. In the schedule the task attributes, such as task start time, are statically decided off-line and used without modification throughout system operation time. However, for dynamic real-time systems, in which new tasks may arrive during the operation, or tasks may have relative constraints based on information only known at run-time, such static schemes may lack the ability to accommodate dynamic changes. Clearly a solution of dynamic real-time scheduling has to reflect the knowledge about tasks and their execution characteristics. In this dissertation we present a {\em dynamic time-based scheduling scheme} and show its applicability for three problem domains. In dynamic time-based scheduling scheme attributes of task instances in the schedule may be represented as functions parameterized with information available at task dispatching time. These functions are called {\em attribute functions} and may denote any attribute of a task instance, such as lower and upper bound of its start time, its execution mode, etc. Flexible resource management becomes possible in this scheme by utilizing the freedom provided by the scheme. First, we study the problem of dynamic dispatching of tasks, reflecting relative timing constraints among tasks. The relative constraints may be defined across the boundary of two consecutive scheduling windows as well as within one scheduling window. We present the solution approach with which we are not only able to test the schedulability of a task set, but also able to obtain maximum slack time by postponing static task executions at run-time. Second, new framework is formulated for designing real-time control systems in which the assumption of fixed sampling period is relaxed. That is, sampling time instants are found adaptively based on physical system state such that a new cost function value is minimized which incorporates computational costs. We show, for linear time-invariant control systems, that the computation requirement can be reduced while maintaining the quality of control. Third, acceptance tests are found for dynamically arriving aperiodic tasks, and for dynamically arriving sporadic tasks, respectively, under the assumption that an Earliest Deadline First scheduling policy is used for resolving resource contention between dynamic and static(dynamic) tasks. Dynamic time-based scheduling scheme can be applied as solution approaches for these problems as will be shown in this dissertation, and its effectiveness will be demonstrated. Also cross-referenced as UMIACS-TR-97-81 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Improved Methods for Approximating Node Weighted Steiner Trees and. Sudipto Guha. Samir Khuller. December 1997.
A greedy approximation algorithm based on ``spider decompositions'' was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio of $2 \ln k$, where $k$ is the number of terminals. However, the best known lower bound on the approximation ratio is $\ln k$, assuming that $NP \not\subseteq DTIME[n^{O(\log \log n)}]$, by a reduction from set cover. We show that for the unweighted case we can obtain an approximation factor of $\ln k$. For the weighted case we develop a new decomposition theorem, and generalize the notion of ``spiders'' to ``branch-spiders'', that are used to design a new algorithm with a worst case approximation factor of $1.5 \ln k$. This algorithm, although polynomial, is not very practical due to its high running time; since we need to repeatedly find many minimum weight matchings in each iteration. We are able to generalize the method to yield an approximation factor approaching $1.35 \ln k$. We also develop a simple greedy algorithm that is practical and has a worst case approximation factor of $1.6103 \ln k$. The techniques developed for the second algorithm imply a method of approximating node weighted network design problems defined by 0-1 proper functions. These new ideas also lead to improved approximation guarantees for the problem of finding a minimum node weighted connected dominating set. The previous best approximation guarantee for this problem was $3 \ln n$. By a direct application of the methods developed in this paper we are able to develop an algorithm with an approximation factor approaching $1.35 \ln n$. (Also cross-referenced as UMIACS-TR-97-80) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Applying Traversal-Pattern-Sensitive Pointer Analysis to Dependence Analysis. Yuan-Shin Hwang. Joel Saltz. November 1997.
This paper presents a technique for dependence analysis on programs with pointers or dynamic recursive data structures. It differs from previously proposed approaches in analyzing structure access conflicts between traversal patterns before gathering alias and connection information. Conflict analysis is conducted under the assumption that each unique path leads to a distinct storage location, and hence traversal patterns can be analytically compared to identify possible conflicts. The rationale of this assumption is that if statements are deemed to be dependent by this approach, they are inherently sequential regardless of the shapes of the data structures they traverse. Consequently, there is no need to perform alias/connection analysis on the statements that construct such data structures. Furthermore, the information of traversal patterns gathered in conflict analysis phase can direct alias/connection analysis algorithm to focus on statements that are crucial to optimizations or parallelization. A such {\em traversal-pattern-sensitive} pointer analysis algorithm will also be presented. Department of Computer Science, University of Maryland,
Algorithms for Capacitated Vehicle Routing. Moses Charikar. Samir Khuller. Balaji Raghavachari. November 1997.
Given $n$ identical objects (pegs), placed at arbitrary initial locations, we consider the problem of transporting them efficiently to $n$ target locations (slots) with a vehicle that can carry at most $k$ pegs at a time. This problem is referred to as $k$-delivery TSP, and it is a generalization of the Traveling Salesman Problem. We give a 5-approximation algorithm for the problem of minimizing the total distance traveled by the vehicle. There are two kinds of transportations possible --- one that could drop pegs at intermediate locations and pick them up later in the route for delivery (preemptive) and one that transports pegs to their targets directly (non-preemptive). In the former case, by exploiting the freedom to drop, one may be able to find a shorter delivery route. We construct a non-preemptive tour that is within a factor 5 of the optimal preemptive tour. In addition we show that the ratio of the distances traveled by an optimal non-preemptive tour versus a preemptive tour is bounded by 4. (Also cross-referenced as UMIACS-TR-97-79) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Neural Learning of Chaotic Dynamics: The Error Propagation Algorithm. Rembrandt Bakker. Jaap C. Schouten. Cor M. van den Bleek. C. Lee Giles. October 1997.
An algorithm is introduced that trains a neural network to identify chaotic dynamics from a single measured time-series. The algorithm has four special features: 1. The state of the system is extracted from the time-series using delays, followed by weighted Principal Component Analysis (PCA) data reduction. 2. The prediction model consists of both a linear model and a Multi- Layer-Perceptron (MLP). 3. The effective prediction horizon during training is user-adjustable due to error propagation: prediction errors are partially propagated to the next time step. 4. A criterion is monitored during training to select the model that as a chaotic attractor is most similar to the real system attractor. The algorithm is applied to laser data from the Santa Fe time-series competition (set A). The resulting model is not only useful for short-term predictions but it also generates time-series with similar chaotic characteristics as the measured data. _Also cross-referenced as UMIACS-TR-97-77) University of Maryland Institute for Advanced Computer Studies, Delft University of Technology, Department of Chemical Process, NEC Research Institute,
A Generalized Framework for Indexing OLAP Aggregates. Yannis Kotidis. October 1997.
Decision support applications often require fast response time to a wide variety of aggregate queries extracted from huge amounts of data. In this paper we propose the use of well organized packed R-trees for storing and maintaining multidimensional aggregates. Moreover, we present a general framework for mapping OLAP data to a collection of R-trees that achieve a high degree of data clustering with very low space overhead. We then propose four different allocation strategies designed to optimize different application needs. On the second part of the paper we present experimental results on high dimensionality OLAP data (up to 10 dimensions) of realistic size. Finally we characterize the performance of the proposed allocation strategies with respect to both incremental updates and response time for a variety of different queries. (Also cross-referenced as UMIACS-TR-97-76) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On an Inexpensive Triangular Approximation to the Singular Value. G. W. Stewart. October 1997.
In this paper we introduce a new decomposition called the pivoted QLP~decomposition. It is computed by applying pivoted orthogonal triangularization to the columns of the matrix $X$ in question to get an upper triangular factor $R$ and then applying the same procedure to the rows of $R$ to get a lower triangular matrix $L$. The diagonal elements of $R$ are called the R-values of $X$; those of $L$ are called the L-values. Numerical examples show that the L-values track the singular values of $X$ with considerable fidelity\,---\,far better than the R-values. At a gap in the L-values the decomposition provides orthonormal bases of analogues of row, column, and null spaces provided of $X$. The decomposition requires no more than twice the work required for a pivoted QR~decomposition. The computation of $R$ and $L$ can be interleaved, so that the computation can be the rows of $R$ to get a lower triangular matrix $L$. The diagonal elements of $R$ are called the R-values of $X$; those of $L$ are called the L-values. Numerical examples show that the L-values track the singular values of $X$ with considerable fidelity\,---\,far better than the R-values. At a gap in the L-values the decomposition provides orthonormal bases of analogues of row, column, and null spaces provided of $X$. The decomposition requires no more than twice the work required for a pivoted QR~decomposition. The computation of $R$ and $L$ can be interleaved, so that the computation can be terminated at any suitable point, which makes the decomposition especially suitable for low-rank determination problems. The interleaved algorithm also suggests a new, efficient 2-norm estimator. (Also cross-referenced as UMIACS-TR-97-75) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Convergence of a New Rayleigh Quotient Method with Applications. D. P. O'Leary. G. W. Stewart. October 1997.
In this paper we propose a variant of the Rayleigh quotient method to compute an eigenvalue and corresponding eigenvectors of a matrix. It is based on the observation that eigenvectors of a matrix with eigenvalue zero are also singular vectors corresponding to zero singular values. Instead of computing eigenvector approximations by the inverse power method, we take them to be the singular vectors corresponding to the smallest singular value of the shifted matrix. If these singular vectors are computed exactly the method is quadratically convergent. However, exact singular vectors are not required for convergence, and the resulting method combined with Golub--Kahan--Krylov bidiagonalization looks promising for enhancement/refinement methods for large eigenvalue problems. (Also cross-referenced as UMIACS-97-74) Institute for Advanced Computer Studies, University of Maryland, Department of Computer Science, University of Maryland,
Previews and Overviews in Digital Libraries: Designing Surrogates to. Stephan Greene. Gary Marchionini. Catherine Plaisant. Ben Shneiderman. September 1997.
To aid designers of digital library interfaces and web sites in creating comprehensible, predictable and controllable environments for their users, we define and discuss the benefits of previews and overviews as visual information representations. Previews and overviews are graphic or textual representations of information abstracted from primary information objects. They serve as surrogates for those objects. When utilized properly, previews and overviews allow users to rapidly discriminate objects of interest from those not of interest, and to more fully understand the scope and nature of large collections of information resources. We provide a more complete definition of previews and overviews, and discuss system parameters and aspects of primary information objects relevant to designing effective preview and overviews. Finally, we present examples that illustrate the use of previews and overviews and offer suggestions for designers. (Also cross-referenced as UMIACS-TR-97-73) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Modified Streamline Diffusion Schemes for Convection-Diffusion. H. C. Elman. Y.-T. Shih. October 1997.
We consider the design of robust and accurate finite element approximation methods for solving convection--diffusion problems. We develop some two--parameter streamline diffusion schemes with piecewise bilinear (or linear) trial functions and show that these schemes satisfy the necessary conditions for $L^{2}$-uniform convergence of order greater than $1/2$ introduced by Stynes and Tobiska. For smooth problems, the schemes satisfy error bounds of the form $O(h)|u|_{2}$ in an energy norm. In addition, extensive numerical experiments show that they effectively reproduce boundary layers and internal layers caused by discontinuities on relatively coarse grids, without any requirements on alignment of flow and grid. (Also cross-referenced as UMIACS-TR-97-71) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Facility Location with Dynamic Distance Functions. . Randeep Bhatia. Sudipto Guha. Samir Khuller. Yoram J. Sussmann. October 1997.
Facility location problems have always been studied with the assumption that the edge lengths in the network are {\em static} and do not change over time. The underlying network could be used to model a city street network for emergency facility location/hospitals, or an electronic network for locating information centers. In any case, it is clear that due to traffic congestion the traversal time on links {\em changes} with time. Very often, we have some estimates as to how the edge lengths change over time, and our objective is to choose a set of locations (vertices) as centers, such that at {\em every} time instant each vertex has a center close to it (clearly, the center close to a vertex may change over time). We also provide approximation algorithms as well as hardness results for the $K$-center problem under this model. This is the first comprehensive study regarding approximation algorithms for facility location for good time-invariant solutions. (Also cross-references as UMIACS-TR-97-70) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Sorting on Clusters of SMPs. David R. Helman. Joseph Ja'Ja'. November 1997.
Clusters of symmetric multiprocessors (SMPs) have emerged as the primary candidates for large scale multiprocessor systems. In this paper, we introduce an efficient sorting algorithm for clusters of SMPs. This algorithm relies on a novel scheme for stably sorting on a single SMP coupled with balanced regular communication on the cluster. Our SMP algorithm seems to be asymptotically faster than any of the published algorithms we are aware of. The algorithms were implemented in C using Posix Threads and the SIMPLE library of communication primitives and run on a cluster of DEC AlphaServer 2100A systems. Our experimental results verify the scalability and efficiency of our proposed solution and illustrate the importance of considering both memory hierarchy and the overhead of shifting to multiple nodes. (Also cross-reference as UMIACS-TR-97-69 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Building an Electronic Learning Community: From Design to. Anne Rose. Wei Ding. Gary Marchionini. Josephus Beale, Jr.. Victor Nolet. September 1997.
The University of Maryland at College Park in cooperation with Baltimore City Public Schools and several partners is working to build an electronic learning community that provides teachers with multimedia resources that are linked to outcome-oriented curriculum guidelines. The initial resource library contains over 1000 videos, texts, images, web sites, and instructional modules. Using the current system, teachers can explore and search the resource library, create and present instructional modules in their classrooms, and communicate with other teachers in the community. This paper discusses the iterative design process and the results of informal usability testing. Lessons learned are also presented for developers. (Also cross-referenced as UMIACS-TR-97-67 and as CLIS-TR-97-12) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Applying DEF/USE Information of Pointer Statements toTraversal-Pattern-Aware Pointer Analysis. Yuan-Shin Hwang. Joel Saltz. July 1997.
Pointer analysis is essential for optimizing and parallelizing compilers. It examines pointer assignment statements and estimates pointer-induced aliases among pointer variables or possible shapes of dynamic recursive data structures. However, previously proposed techniques are not able to gather useful information or have to give up further optimizations when overall recursive data structures appear to be cyclic even though patterns of traversal are linear. The reason is that these proposed techniques perform pointer analysis without the knowledge of traversal patterns of dynamic recursive data structures to be constructed. This paper proposes an approach, {\em traversal-pattern-aware pointer analysis}, that has the ability to first identify the structures specified by traversal patterns of programs from cyclic data structures and then perform analysis on the specified structures. This paper presents an algorithm to perform shape analysis on the structures specified by traversal patterns. The advantage of this approach is that if the specified structures are recognized to be acyclic, parallelization or optimizations can be applied even when overall data structures might be cyclic. The DEF/USE information of pointer statements is used to relate the identified traversal patterns to the pointer statements which build recursive data structures. (Also cross-referenced as UMIACS-TR-97-66) Institute for Advanced Computing, University of Maryland, Department of Computer Science, University of Maryland,
TIKHONOV REGULARIZATION AND TOTAL LEAST SQUARES. Gene H. GOLUB. Per Christian HANSEN. Dianne P. O'LEARY. August 1997.
Discretizations of inverse problems lead to systems of linear equations with a highly ill-conditioned coefficient matrix, and in order to compute stable solutions to these systems it is necessary to apply regularization methods. We show how Tikhonov's regularization method, which in its original formulation involves a least squares problem, can be recast in a total least squares formulation, suited for problems in which both the coefficient matrix and the right-hand side are known only approximately. We analyze the regularizing properties of this method and demonstrate by a numerical example that in certain cases with large perturbations, the new method is superior to standard regularization methods. (Also cross-referenced as UMIACS-TR-97-65) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Efficient Iterative Solution of the Three-Dimensional Helmholtz. Howard C. Elman. Dianne P. O'Leary. August 1997.
We examine preconditioners for the discrete indefinite Helmholtz equation on a three-dimensional box-shaped domain with Sommerfeld-like boundary conditions. The preconditioners are of two types. The first is derived by discretization of a related continuous operator that differs from the original only in its boundary conditions. The second is derived by a block Toeplitz approximation to the discretized problem. The resulting preconditioning matrices allow the use of fast transform methods and differ from the discrete Helmholtz operator by an operator of low rank. We present experimental results demonstrating that when these methods are combined with Krylov subspace iteration, convergence rates depend only mildly on both the wave number and discretization mesh size. In addition, the methods display high efficiencies in an implementation on an IBM SP-2 parallel computer. (Also cross-referenced as UMIACS-TR-97-63) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Verifying Systems with Integer Constraints and Boolean Predicates: A. Tevfik Bultan. Richard Gerber. Christopher League. August 1997.
Symbolic model checking has proved highly successful for large finite-state systems, in which states can be compactly encoded using binary decision diagrams (BDDs) or their variants. The inherent limitation of this approach is that it cannot be applied to systems with an infinite number of states -- even those with a single unbounded integer. Alternatively, we recently proposed a model checker for integer-based systems that uses Presburger constraints as the underlying state representation. While this approach easily verified some subtle, infinite-state concurrency problems, it proved inefficient in its treatment of Boolean and (unordered) enumerated types -- which possess no natural mapping to the Euclidean coordinate space. In this paper we describe a model checker which combines the strengths of both approaches. We use a composite model, in which a formula's valuations are encoded in a mixed BDD-Presburger form, depending on the variables used. We demonstrate our technique's effectiveness on a nontrivial requirements specification, which includes a mixture of Booleans, integers and enumerated types. (Also cross-referenced as UMIACS-TR-97-62) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Temporal accuracy and modern high performance processors: A case study. Krishnan K. Kailas. Bao Trinh. Ashok K. Agrawala. August 1997.
Real-time systems must be able to ensure temporally determinate execution of real-time tasks at run-time. By temporal accuracy, we refer to the timing accuracy with which the execution of a task can be started at a predetermined time. Temporally determinate execution of tasks on modern high performance processors is becoming more and more difficult because of the techniques used by these processors to boost their average performance. This report describes the experiments we have conducted to measure the temporal accuracy that can be achieved with the Pentium Pro processor. We present the results of these experiments and analyze these results to highlight the limitations of temporally determinate execution of programs on modern high performance processor architectures. (Also cross-referenced as UMIACS-TR-97-60) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiler Optimizations for Eliminating Cache Conflict Misses. Gabriel Rivera. Chau-Wen Tseng. July 1997.
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor cache performance in scientific programs, particularly within loop nests. We present two compiler transformations to eliminate conflict misses: 1) modifying variable base addresses, 2) padding inner array dimensions. Unlike compiler transformations that restructure the computation performed by the program, these two techniques modify its data layout. Using cache simulations of a selection of kernels and benchmark programs, we show these compiler transformations can eliminate conflict misses for applications with regular memory access patterns. Cache miss rates for a 16K, direct-mapped cache are reduced by 35% on average for each program. For some programs, execution times on a DEC Alpha can be improved up to 60%. (Also cross-referenced as UMIACS-TR-97-59) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Towards a Theory of Interestingness. Wiktor Marek. V.S. Subrahmanian. August 1997.
There are a wide variety of applications that either require or assume the existence of some underlying definition of ``interestingness.'' However, interests vary from user to user, from situation to situation, and from one time to another. This diversity of interests cannot be captured through a single definition. In this paper, we propose a framework called {\em Full Interestingness Programs} (FIPs) that form a subclass of the Hybrid Knowledge Base Paradigm of Lu, Nerode and Subrahmanian. FIPs may be built ``on top'' of any query language whatsoever. Using FIPs, interests may be easily expressed and captured, and used on an application-specific basis using an application-independent FIP-evaluator. In this paper, we provide a formal semantics for FIPs, as well as techniques for processing requests (queries) to FIPs. (Also cross-referenced as UMIACS-TR-97-57) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Motor Control Model Based on Self-organizing Feature Maps. Yinong Chen. August 1997.
Self-organizing feature maps have become important neural modeling methods over the last several years. These methods have not only shown great potential in application fields such as motor control, pattern recognition, optimization, etc, but have also provided insights into how mammalian brains are organized. Most past work developing self-organizing features maps has focused on systems with a single map that is solely sensory in nature. This research develops and studies a model which has multiple self-organizing feature maps in a closed-loop control system, and that involves motor output as well as proprioceptive and/or visual sensory input. The model is driven by a simulated arm that moves in 3D space. By applying initial activations at randomly selected motor cortex regions, the neural network model spontaneously self-organizes, and demonstrates the appearance of multiple, reasonably stable motor and proprioceptive sensory maps and their interrelationships to each other. These cortical feature maps capture the mechanical constraints imposed by the model arm. They are aligned in a way consistent with a {\em temporal correlation hypothesis}: temporally correlated features usually cause their corresponding cortical map representations to be spatially correlated. Simulations of variations of the motor control model with visual inputs indicates the formation of visual input maps. These maps are also partially aligned with motor output maps, reflecting the degree of temporal correlations during training. The simultaneous presence of proprioceptive input causes the visual input maps to distinguish pairs of antagonist muscles and to be correlated with only one muscle in each pair. Moreover, some theoretical analysis with a simplified model gives insights into the nature of cortical feature maps and sheds light on the driving force behind map correlations. All of these results have provide more understanding about the organization of cortical feature maps, and how these maps might be used to achieve consistent motor commands based on sensory feedback. (Also cross-referenced as UMIACS-TR-97-56) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
High Performance Algorithms for Global BRDF Retrieval. Zengyan Zhang. Satya Kalluri. Joseph Ja'Ja'. Shunlin Liang. Townshend. July 1997.
Most Land cover types are ``anisotropic'', that is, the solar radiation reflected by the surface is not uniform in all directions. Characterizing the Bidirectional Reflectance Distribution Function (BRDF) of the earth's surface is critical in understanding surface anisotropy. Though there are several methods to retrieve the BRDF of various land cover types, most of them have been applied over small data sets collected either on ground or from aircraft at limited spatial and temporal scales. In this paper, we use multi-angular, multi-temporal and multi-band Pathfinder AVHRR Land (PAL) data set to retrieve the global BRDF in the red and near infrared wavelengths. The PAL data set used in our study has a spatial resolution of 8-km and 10-day composite data for four years (1983 to 1986). In particular, we develop high performance algorithms to retrieve global BRDF using three widely different models. Given the volume of data involved (about 27 GBytes), we attempt to optimize the I/O time as well as minimize the overall computational complexity. Our algorithms access the global data once, followed by a redistribution of land pixel data to balance the computational loads among the different nodes of a multiprocessor system. This strategy results in an optimized I/O access time with efficiently balanced computations across the nodes. Experimental data on a 16-node IBM SP2 is used to support these claims and to illustrate the scalability of our algorithms. (Also cross-referenced as UMIACS-TR-97-55) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Dynamic Query Operator Scheduling for Wide-Area Remote Access. Laurent Amsaleg. Michael J. Franklin. Anthony Tomasic. October 1997.
Distributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based query execution model performs poorly. We have developed a class of methods, called query scrambling, for dealing explicitly with the problem of unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. In this paper we focus on the dynamic scheduling of query operators in the context of query scrambling. We explore various choices for dynamic scheduling and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and delayed or possibly bursty arrivals of data. Our performance results show that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS- TR-97-54 Unoversity of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Fast Iterative Image Restoration with a Spatially-Varying PSF. James G. Nagy. Dianne P. O'Leary. June 1997.
We describe how to efficiently apply a spatially-variant blurring operator using linear interpolation of measured point spread functions. Numerical experiments illustrate that substantially better resolution can be obtained at very little additional cost compared to piecewise constant interpolation. (Also cross-referenced as UMIACS-TR-97-53) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Limited-Memory Matrix Methods with Applications. Tamara G. Kolda. April 1997.
The focus of this dissertation is on matrix decompositions that use a limited amount of computer memory, thereby allowing problems with a very large number of variables to be solved. Specifically, we will focus on two applications areas: optimization and information retrieval. We introduce a general algebraic form for the matrix update in limited-memory quasi-Newton methods. Many well-known methods such as limited-memory Broyden Family methods satisfy the general form. We are able to prove several results about methods which satisfy the general form. In particular, we show that the only limited-memory Broyden Family method (using exact line searches) that is guaranteed to terminate within n iterations on an n-dimensional strictly convex quadratic is the limited-memory BFGS method. Furthermore, we are able to introduce several new variations on the limited-memory BFGS method that retain the quadratic termination property. We also have a new result that shows that full-memory Broyden Family methods (using exact line searches) that skip p updates to the quasi-Newton matrix will terminate in no more than n+p steps on an n-dimensional strictly convex quadratic. We propose several new variations on the limited-memory BFGS method and test these on standard test problems. We also introduce and test a new method for a process known as Latent Semantic Indexing (LSI) for information retrieval. The new method replaces the singular value matrix decomposition (SVD) at the heart of LSI with a semi-discrete matrix decomposition (SDD). We show several convergence results for the SDD and compare some strategies for computing it on general matrices. We also compare the SVD-based LSI to the SDD-based LSI and show that the SDD-based method has a faster query computation time and requires significantly less storage. We also propose and test several SDD-updating strategies for adding new documents to the collection. Dept. of Computer Science, Univ. of Maryland,
Designing Dynamic Temporal Controls for Critical Systems. Seonho Choi. Ashok K. Agrawala. Leyuan Shi. May 1997.
Traditional control systems have been designed to exercise control at regularly spaced time instants. When a discrete version of the system dynamics is used, a constant sampling interval is assumed and a new control value is calculated and exercised at each time instant. In this paper, we propose a new control scheme, {\it dynamic temporal control}, in which we not only calculate the control value but also dynamically decide the time instants when the new control computations have to be calculated. Taking a discrete, linear, time-invariant system, and a cost function which reflects a cost for computation of the control values, as an example, we show the feasibility of using this scheme. We implement the dynamic temporal control scheme in a rigid body satellite control example and demonstrate the significant reduction in cost. The scheme proposed here can be implemented using real-time operating system, such as {\em Maruti}, which schedules activities along the time axis. The reduced computations for control permit the use of the same processor for higher level functions resulting in a significant improvement in the performance of the overall system. (Also cross-referenced as UMIACS-TR-97-51) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Requirements of I/O Systems for Parallel Machines: An. Mustafa Uysal. Anurag Acharya. Joel Saltz. May 1997.
I/O-intensive parallel programs have emerged as one of the leading consumers of cycles on parallel machines. This change has been driven by two trends. First, parallel scientific applications are being used to process larger datasets that do not fit in memory. Second, a large number of parallel machines are being used for non-scientific applications. Efficient execution of these applications requires high-performance I/O systems which have been designed to meet their I/O requirements. In this paper, we examine the I/O requirements for data-intensive parallel applications and the implications of these requirements for the design of I/O systems for parallel machines. We attempt to answer the following questions. First, what is the steady-state as well peak I/O rate required? Second, what spatial patterns, if any, occur in the sequence of I/O requests for individual applications? Third, what is the degree of intra-processor and inter-processor locality in I/O accesses? Fourth, does the application structure allow programmers to disclose future I/O requests to the I/O system? Fifth, what patterns, if any, exist in the sequence of inter-arrival times of I/O requests? To address these questions, we have analyzed I/O request traces for a diverse set of I/O-intensive parallel applications. This set includes seven scientific applications and four non-scientific applications. (Also cross-referenced as UMIACS-TR-97-49) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David A. Bader. Joseph Ja'Ja'. May 1997.
SIMPLE: A Methodology for Programming High Performance Algorithms on. We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make efficient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satisfied searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by an ATM switch. (Also cross-referenced as UMIACS-TR-97-48.) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
"Handling Updates and Crashes in VoD Systems". Eenjun Hwang. Kemal Kilic. V.S. Subrahmanian. May 1997.
Though there have been several recent efforts to develop disk based video servers, these approaches have all ignored the topic of updates and disk server crashes. In this paper, we present a priority based model for building video servers that handle two classes of events: user events that could include enter, play, pause, rewind, fast-forward, exit, as well as system events such as insert, delete, server-down, server-up that correspond to uploading new movie blocks onto the disk(s), eliminating existing blocks from the disk(s), and/or experiencing a disk server crash. We will present algorithms to handle such events. Our algorithms are provably correct, and computable in polynomial time. Furthermore, we guarantee that under certain reasonable conditions, continuing clients experience jitter free presentations. We further justify the efficiency of our techniques with a prototype implementation and experimental results. (Also cross-referenced as UMIACS-TR-97-47) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Design and Evaluation of Incremental Data Structures and Algorithms for. Egemen Tanin. Richard Beigel. Ben Shneiderman. May 1996.
Dynamic query interfaces (DQIs) are a recently developed database access mechanism that provides continuous real-time feedback to the user during query formulation. Previous work shows that DQIs are an elegant and powerful interface to small databases. Unfortunately, when applied to large databases, previous DQI algorithms slow to a crawl. We present a new incremental approach to DQI algorithms and display updates that works well with large databases, both in theory and in practice. (Also cross-referenced as UMIACS-TR-97-46 University of Maryland Insttitue for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Viewing personal history records: A comparison of Tabular format and. Diane Lindwarm Alonso. Anne Rose. Catherine Plaisant. Kent L. Norman. May 1997.
Thirty-six participants used a static version of either LifeLines, a graphical interface, or a Tabular representation to answer questions about a database of temporal personal history information. Results suggest that overall the LifeLines representation led to much faster response times, primarily for questions which involved interval comparisons and making intercategorical connections. In addition, on a follow-up questionnaire, nine out of eleven questions rated LifeLines preferable in terms of user satisfaction. A "first impression" test showed that LifeLines can reduce some of the biases of the tabular record summary. A post-experimental memory test led to significantly (p<.004) higher recall for LifeLines. Finally, simple interaction techniques are proposed to augment LifeLines ability to better deal with precise dates, attribute coding and overlaps. Department of Computer Science, University of Maryland,
Scheduling Aperiodic and Sporadic Tasks in Hard Real-Time Systems. Seonho Choi. Ashok K. Agrawala. May 1997.
The stringent timing constraints as well as the functional correctness are essential requirements of hard real-time systems. In such systems, scheduling plays a very important role in satisfying these constraints. The priority based scheduling schemes have been used commonly due to the simplicity of the scheduling algorithm. However, in the presence of task interdependencies and complex timing constraints, such scheduling schemes may not be appropriate due to the lack of an efficient mechanism to schedule them and to carry out the schedulability analysis. In contrast, the time based scheduling scheme may be used to schedule a set of tasks with greater degree of schedulability achieved at a cost of higher complexity of off-line scheduling. One of the drawbacks of currently available scheduling schemes, however, is known to be their inflexibility in dynamic environments where dynamic processes exist, such as aperiodic and sporadic processes. We develop and analyze scheduling schemes which efficiently provide the flexibility required in real-time systems for scheduling processes arriving dynamically. This enables static hard periodic processes and dynamic processes(aperiodic or sporadic) to be jointly scheduled. (Also cross-referenced as UMIACS-TR-97-44) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Apparency of Contingencies in Pull Down Menus. D. L. Alonso. K. L. Norman. May 1997.
In many computer interfaces the underlying structures and contingencies are often hidden from the user's view. Users high in Spatial Visualization Ability (SVA) are able to quickly determine and manage the contingencies of these relationships and are not severely affected by this problem. Low SVA users, however, have difficulty visualizing these contingencies and often get lost. We examined the performance of 160 undergraduate students to determine whether revealing hidden contingencies through visual cues would facilitate low SVA users enabling them to approach the level of performance of high SVA users on a computerized path-finding task. It was found that using color and displaying paths improved performance, however, there is no indication that it is more beneficial to low than high SVA users. (Also cross-referenced as UMIACS-TR-97-43) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Interface and Data Architecture for Query Preview in Networked. Khoa Doan. Catherine Plaisant. Ben Shneiderman. Tom Bruns. October 1997.
There are numerous problems associated with formulating queries on networked information systems. These include data diversity, data complexity, network growth, varied user base, and slow network access. This paper proposes a new approach to a network query user interface which consists of two phases: query preview and query Rrefinement. This new approach is based on the concepts of dynamic queries and query previews, which guides users in rapidly and dynamically eliminating undesired datasets, reducing the data volume to manageable size, and refining queries locally before submission over a network. Examples of 2 applications are given: a Restaurant Finder and prototype with NASA's Earth Observing Systems--Data Information Systems (EOSDIS). Data architecture is discussed and user's feedback is presented. Dynamic queries and query previews provide solutions to many existing problems in querying networked information systems. Department of Computer Science, University of Maryland,
Visualizing websites using a hierarchical table of contents browser:. David A. Nation. Catherine Plaisant. Gary Marchionini. Anita Komlodi. May 1997.
A method is described for visualizing the contents of a Web site with a hierarchical table of contents using a Java program and applet called WebTOC. The automatically generated expand/contract table of contents provides graphical information indicating the number of elements in branches of the hierarchy as well as individual and cumulative sizes. Color can be used to represent another attribute such as file type and provide a rich overview of the site for users and managers of the site. Early results from user studies suggest that WebTOC is easily learned and can assist users in navigating websites. (Also cross-referenced as UMIACS-TR-97-41) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Study on Video Browsing Strategies. Wei Ding. Gary Marchionini. August 1997.
Due to the unique characteristics of video, traditional surrogates and control/browsing mechanisms that facilitate text-based information retrieval may not work sufficiently for video. In this paper, a video browsing interface prototype with key frames and fast play-back mechanisms was built and tested. Subjects performed two kinds of browsing-related tasks: object identification and video comprehension under different display speeds (1 fps, 4 fps, 8 fps, 12 fps and 16 fps). It was found that browsing the key frames between 8 to 12 fps could potentially define a functional limit in object identificationaccuracy. There was no significant performance difference found across display speeds tested. The results also showed that lower speeds were required for object identification than for video comprehension. How user performance was affected by individual characteristics such as age, gender, academic background and TV- or movie-watching habits, was investigated, but no significant difference was found due to the limit of sample size and other constraints. (Also cross-referenced as UMIACS-TR-97-40) (Also cross-referenced as CLIS-TR-97-06) University of Maryland Institute for Advanced Computer Studies, Univ. of Maryland Human-Computer Interaction Laboratory, Univ. of Maryland College of Library and Information Services,
Elastic Windows: A Hierarchical Multi-Window World-Wide Web Browser. Eser Kandogan. Ben Shneiderman. May 1997.
The World-Wide Web (WWW) is becoming an invaluable source for the information needs of many users. However, current browsers are still primitive, in that they do not support many of the navigation needs of users, as indicated by user studies. They do not provide an overview and a sense of location in the information structure being browsed. Also they do not facilitate the organization and filtering of information nor aid users in accessing already visited pages without much cognitive demands. In this paper, a new browsing interface is proposed with multiple hierarchical windows and efficient multiple window operations. It provides a flexible organization where users can quickly organize, filter, and restructure the information on the screen as they reformulate their goals. Overviews can give the user a sense of location in the browsing history as well as provide fast access to a hierarchy of pages. Department of Computer Science, University of Maryland,
Content + Connectivity = Community: Digital Resources for a Learning. Gary Marchionini. Victor Nolet. Hunter Williams. Wei Ding. Josephus Beale Jr.. Anne Rose. Allison Gordon. Ernestine Enomoto. Lynn Harbinson. January 1997.
Digital libraries offer new opportunities to provide access to diverse resources beyond those held in school buildings and to allow teachers and learners to reach beyond classroom walls to other people to build distributed learning communities. Creating learning communities requires that teachers change their behaviors and the Baltimore Learning Community Project described here is based on the premise that access to resources should be tied to the assessment outcomes that increasingly drive curricula and classroom activity. Based on examination of curriculum guides and discussions with project teachers, an interface for the BLC digital library was prototyped. Three components (explore, construct, and present) of this user interface that allows teachers to find text, video, images, web sites, and instructional modules and create their own modules are described. Although the technological challenges of building learning communities are significant, the greater challenges are mainly social and political. Department of Computer Science, University of Maryland,
User Interfaces for a Complex Robotic Task: A Comparison of Tiled vs.. J.Corde Lane. Steven P. Kuester. Ben Shneiderman. January 1997.
High complexity tasks, such as remote teleoperation of robotic vehicles, often require multiple windows. For these complex tasks, the windows necessary for task completion, may occupy more area than available on a single visual display unit (VDU). Since the focus of the robotic task constantly changes, modular control panels that can be opened, closed, and moved on the screen are invaluable to the operator. This study describes a specific robotic task and the need for a multi-window interface that can be easily manipulated. This paper examines two multi-window management strategies: tiled (fixed size) and arbitrary overlap. Multi-window searches were performed using the two management styles and they were compared on the basis of search completion time and error rates. Results with 35 novice users showed faster completion times for the tiled management strategy than for the arbitrary overlap strategy. Other factors such as the number of windows available, the number of displayed windows, workload of opening or closing windows, and effect of learning are discussed. Department of Computer Science, University of Maryland,
Evaluating Multilingual Gisting of Web Pages. Philip Resnik. March 1997.
We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form. (Also cross-referenced as UMIACS-TR-97-39) University of Maryland Institute for Advanced Computer Studies, Dept. of Linguistics, University of Maryland,
Development of an Object Oriented Parser/Generator, Ontologies, and. Bonnie J. Dorr. February 1996.
This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. Our primary objective was to develop an interlingual representation based on lexical conceptual structure (LCS) and to examine the relation between this representation and a set of linguistically motivated semantic classes. We have focused on several areas in support of our objectives: (1) updating a Korean message-passing parser to handle more Korean linguistic phenomena and porting this to Windows on the PC so that it runs with LCS composition; (2) scaling up the Korean lexicon to include thousands of new words converted by the Yale-romanization program, to be integrated with the Korean message-passing parser; (3) investigation of the syntax-semantics relation and use of this relation in automatic classification of verbs; (4) investigation of the aspectual dimensions as it impacts lexical semantics and the lexical choice process in multilingual generation; and (5) automatic construction of LCS's from lexical-semantic templates and thematic grids. (Also cross-referenced as UMIACS-TR-97-37) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Speech-Based Information Retrieval for Digital Libraries. Douglas W. Oard. March 1997.
Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries. Presently, access to most of this material is provided using a combination of manually annotated metadata and linear search. Recent advances in speech processing technology have produced a number of techniques for extracting features from recorded speech that could provide a useful basis for the retrieval of speech or multimedia objects in large digital library collections. Among these features are the semantic content of the speech, the identity of the speaker, and the language in which the speech was spoken. We propose to develop a graphical and auditory user interface for speech-based information retrieval that exploits these features to facilitate selection of recorded speech and multimedia information objects that include recorded speech. We plan to use that interface to evaluate the effectiveness and usability of alternative ways of exploiting those features and as a testbed for the evaluation of advanced retrieval techniques such as cross-language speech retrieval. (Also cross-referenced as UMIACS-TR-97-36) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Virtual Microscope. Renato Ferreira. Bongki Moon. Jim Humphries. Alan Sussman. Joel Saltz. Robert Miller. Angelo Demarzo. April 1997.
We present the design of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. We discuss several technical challenges related to providing the performance necessary to achieve rapid response time, mainly in dealing with the enormous amounts of data (tens to hundreds of gigabytes per slide) that must be retrieved from secondary storage and processed. To effectively implement the data server, the system design relies on the computational power and high I/O throughput available from an appropriately configured parallel computer. (Also cross-referenced as UMIACS-TR-97-35) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
CAUCHY-LIKE PRECONDITIONERS FOR 2-DIMENSIONAL ILL-POSED PROBLEMS. Misha E. Kilmer. March 1997.
Ill-conditioned matrices with block Toeplitz, Toeplitz block (BTTB) structure arise from the discretization of certain ill-posed problems in signal and image processing. We use a preconditioned conjugate gradient algorithm to compute a regularized solution to this linear system given noisy data. Our preconditioner is a Cauchy-like block diagonal approximation to an orthogonal transformation of the BTTB matrix. We show the preconditioner has desirable properties when the kernel of the ill-posed problem is smooth: the largest singular values of the preconditioned matrix are clustered around one, the smallest singular values remain small, and the subspaces corresponding to the largest and smallest singular values, respectively, remain unmixed. For a system involving $np$ variables, the preconditioned algorithm costs only $O(np (\lg n + \lg p))$ operations per iteration. We demonstrate the effectiveness of the preconditioner on three examples. University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On Hyperbolic Triangularization. M. Stewart. G.W. Stewart. May 1997.
This paper treats the problem of triangularizing a matrix by hyperbolic Householder transformations. The stability of this method, which finds application in block updating and fast algorithms for Toeplitz-like matrices, has been analyzed only in special cases. Here we give a general analysis which shows that two distinct implementations of the individual transformations are relationally stable. The analysis also shows that pivoting is required for the entire triangularization algorithm to be stable. (Also cross-referenced as UMIACS-TR-97034) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Domain-Driven Reconfiguration in Collaborative Virtual Environments. Donald J. Welch. James M. Purtilo. March 1997.
When virtual environments (VE) collaborate to create a shared virtual world, events occur that can have catastrophic effects on that virtual world. These events can be system events, such as the loss of a host or a network link to that host. They can also be events that happen only in the virtual world, for example, a virtual activity that migrates, bringing increased activity to a different VE. To maintain acceptable or realistic behavior can require the restructuring of the collaborative virtual environment (CVE) during execution. The restructuring must take place in accordance with a set of rules mandated by the domain and specific application. The reconfiguration must occur quickly, to maintain realism for the users. Automatic restructuring brings the added benefit of fewer support staff. We call the automatic restructuring of a distributed application with respect to these rules Domain-Driven Reconfiguration and we have developed a software engineering environment to support its inclusion in CVEs. (Also cross-referenced as UMIACS-TR-97-32) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Dynamic Dispatching of Cyclic Real-Time Tasks with Relative Constraints. Seonho Choi. Ashok K. Agrawala. March 1997.
In some hard real-time systems, relative timing constraints may be imposed on task executions, in addition to the release time and deadline constraints. A periodic task may have jitter constraints between the start or finish times of any two consecutive executions. Relative constraints such as separation or relative deadline constraints may be given between start or finish times of tasks (4). One approach is to find a total order on a set of n jobs in a scheduling window, and cyclically use this order at run time to execute the jobs. However, in the presence of the relative constraints, if the job execution times are nondeterminiistic with defined lower and upper bound, it is not always possible to statically assign start times at pre-runtime without sacrificing the schedulability(4). We develop a technique called dynamic cyclic dispatching to enforce relative constraints along with release time and deadline constraints. An ordered set of N jobs is assumed to be given within a scheduling window and this schedule (ordering) is cyclically repeated at runtime. An off-line algorithm is presented to check the schedulability of the job set and to obtain parametric lower and upper bounds on the start times of jobs, if the job set is schedulable. Then, these parametric bounds are evaluated at runtime to obtain a valid time intervals during which jobs can be started. The complexity of this off-line component is shown to be O(n2N3) where n is the number of jobs in a scheduling window that have relative constraints with jobs in the next scheduling window. An online algorithm can evaluate these bounds in O(N3+n5) computation time. Unlike static approached which assign fixed start times to jobs in the scheduling window, our approach not only allows us to flexibly manage the slack times with the schedulability of a task set not affected, but also yields a guaranteed schedulability in the sense that, if other dispatching mechanism can schedule the job sequences satisfying all given constraints, then our mechanism can also schedule them. (Also cross-referenced as UMIACS-TR-97-300 University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
An Accurate Time-Management Unit for Real-Time Processors. Krishnan K. Kailas. Ashok K. Agrawala. March 1997.
Time management is an important aspect of real-time computation. Traditional high performance processors provide little or no support for management of time. In this report, we propose a time-management unit which can greatly help improve the performance of a real-time system. The proposed unit can be added to any processor architecture without affecting its performance. We also explain how the unit helps to solve the clock synchronization problems in a real-time network. (Also cross-referenced as UMIACS-TR-97-28) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Estimating End-to-End Cell Delay Variation in ATM Networks. Ibrahim Korpeoglu. Satish K. Tripathi. Xiaoqiang Chen. March 1997.
Cell delay variation (CDV) is one of the quality of service parameters that can be negotiated between applications and an ATM network. The network should check during connection setup, as part of call admission control, whether it can satisfy the requested CDV of an application. For this comparison, the network should estimate the end-to-end CDV that it can support, by using local information about cell delays and delay variations in switches. An accurate estimation of the end-to-end CDV is important for decreasing call-blocking probability and increasing network utilization. In this article, we will first describe, evaluate, and identify the short-comings of three proposed methods for end-to-end CDV estimation. Then we will present a new method based on Chernoff bound and compare it to the other methods. The Chernoff method is promising since it has good accuracy and applicability under current signalling support for ATM networks. (Also cross-referenced as UMIACS-TR-97-27) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Designing Access Methods for Bitemporal Databases. Anil Kumar. Vassilis J. Tsotras. Christos Faloutsos. March 1997.
By supporting the valid and transaction time dimensions, bitemporal databases represent reality more accurately than conventional databases. In this paper we examine the issues involved in designing efficient access methods for bitemporal databases and propose the partial-persistence and the double-tree methodologies. The partial- persistence methodology reduces bitemporal queries to partial persistence problems for which an efficient access method is then designed. The double-tree methodology "sees" each bitemporal data object as consisting of two intervals (a valid-time and a transaction- time interval), and divides objects into two categories according to whether the right endpoint of the transaction time interval is already known. A common characteristic of both methodologies is that they take into account the properties of each time dimension. Their performance is compared with a straightforward approach that "sees" the intervals associated with a bitemporal object as composing one rectangle which is stored in a single multidimensional access method. Given that some limited additional space is available, our experimental results show that the partial- persistence methodology provides the best overall performance, especially for transaction timeslice queries. For those applications that require ready, off-the-shelf, access methods the double-tree methodology is a good alternative. (Also cross-referenced as UMIACS-TR-97-24) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Aspectual Modifications to a LCS Database for NLP Applications. Bonnie J. Dorr. Mari Broman Olsen. May 1997.
Verbal and compositional lexical aspect provide the underlying temporal structure of events. Knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (Dowty, 1986: Moens and Steedman, 1988; Passoneau, 1988), interfacing to temporal databases (Androutsopoulos, 1996), processing temporal modifiers (Antonisse, 1994), describing allowable alternations and their semantic effects (Resnik, 1996; Tenny, 1994), and selecting tense and lexical items for natural language generation ((Dorr and Olsen, 1996; Klavans and Chodorow, 1992), cf. (Slobin and Bocaz, 1988)). We show that it is possible to represent lexical aspect---both verbal and compositional---on a large scale, using Lexical Conceptual Structure (LCS) representations of verbs in the classes cataloged by Levin (1993). We show how proper consideration of these universal pieces of verb meaning may be used to refine lexical representations and derive a range of meanings from combinations of LCS representations. A single algorithm may therefore be used to determine lexical aspect classes and features at both verbal and sentence levels. Finally, we illustrate how knowledge of lexical aspect facilitates the interpretation of events in NLP applications. (Also cross-referenced as UMIACS-TR-97-21) (Also cross-referenced as LAMP-TR-007) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adam A. Porter. Harvey Siy. Audris Mockus. Lawrence G. Votta. Understanding the Sources of Variation in Software Inspections. January 1997.
In a previous experiment, we determined how various changes in three structural elements of the software inspection process (team size, and number and sequencing of session), altered effectiveness and interval. our results showed that such changes did not significantly influence the defect detection reate, but that certain combinations of changes dramatically increased the inspection interval. We also observed a large amount of unexplained variance in the data, indicating that other factors much be affecting inspection performance. The nature and extent of these other factos now have to be determined to ensure that they had not biased our earlier results. Also, identifying these other factors might suggest additional ways to improve the efficiency of inspection. Acting on the hypothesis that the "inputs" into the inspection process (reviewers, authors, and code units) were significant sources of variation, we modeled their effects on inspection performance. We found that they were responsible for much more variation in defect detection than was process structure. This leads us to conclude that better defect detection techniques, not better process structures, at the key to improving inspection effectiveness. The combined effects of process inputs and process structure on the inspection interval accounted for only a small percentage of the variance in inspection interval. Therefore, there still remain other factors which need to be identified. (Also cross-referenced as UMIACS-TR-97-22) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Bell Laboratories, Naperville, IL,
Adam A. Porter. Fundamental Laws and Assumptions of Software Maintenance. March 1997.
Researchers must pay far more attention to discovering and validating the principles that underlie software maintenance and evolution. This was one of the major conclusions reached during the International Workshop on Empirical Studies of Software Maintenance. This workship, held in November 1996 in Monterey, California, brought together an international group of researchers to discuss the successes, challenges and open issues in software maintenance and evolution. This article documents the discussion of the subgroup on fundamental laws and assumption of software maintenance. The participants of this group in included researchers in software engineering, the behavioral sciences, information systems and statistics. Their main conclusion was that insufficient effort has been paid to synthesizing research conjectures into validated theories and this problem has slowed progress in software maintenance. To help remedy this situation they made the following recommendations: (1) when we use empirical methods, an explicit goal should be to develop theories, (2) we should look to other disciplines for help where it is appropriate, and (3) our studies should use a wider range of empirical methods (Also cross-referenced as UMIACS-TR-97-21) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adam A. Porter. C. A. Toman. Harvey Siy. Lawrence G. Votta. An Experiment to Assess the Cost-Benefits of Code Inspections in Large. March 1997.
We conducted a long-term experiment to compare the costs and benefits of several different software inspection methods. These methods were applied by professional developers to a commercial software product they were creating. Because the laboratory for this experiment was a live development effort, we took special care to minimize cost and risk to the project, while maximizing our ability to gather useful data. This article has several goals: (1) to describe the experiment's design and show how we used simulation techniques to optimize it, (2) to present our results and discuss their implications for both software practitioners and researchers, and (3) to discuss several new questions raised by our findings. For each inspection we randomly assigned 3 independent variables: (1) the number of reviewers on each inspection team (1,2, or 4), (2) the number of teams inspection the code unit (1 or 2), and (3) the requirement that defects be repaired between the first and second team's inspections. The reviewers for arch inspection were randomly selected without replacement from a pool of 11 experienced software developers. The dependent variable for each inspection included inspection interval (elapsed time), total effort, and the defect detection rate. Our results are based on the observation of 88 inspection s and challenge certain long-held beliefs about the most cost-effective ways to conduct inspections and raise some questions about the benefits of recently proposed methods. (Also cross-referenced as UMIACS-TR-97-20) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, AT&T Bell Laboratories, Naperville IL,
Understanding the Effects of Developer Activities on Inspection. Adam A. Porter. Harvey Siy. Lawrence G. Votta. March 1997.
We have conducted an industrial experiment to assess the cost-benefit tradeoffs of several software inspection processes. Our results to date explain the variation in observed effectiveness very well, but are unable to satisfactorily explain variation in inspection interval. In this article we examine the effect of a new factor - process environment - on inspection interval (calendar time needed to complete the inspection). Our analysis suggests that process environment does indeed influence inspection interval. in particular, we found that non-uniform work priorities, time-varying workloads, and deadlines have significant effects. Moreover, these experiences suggest that regression models are inherently inadequate for interval modeling, and that queueing models may be more effective. (Also cross-referenced as UMIACS-TR-97-19) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Specification-based Testing of Reactive Software: Tools and Experiments. Lalita Jategaonkar Jangadeesan. Adam A. Porter. Carlos Puchol. J. Christopher Ramming. Lawrence G. Votta. March 1997.
Testing commercial software is expensive and time consuming. Automated testing methods promise to save a great deal of time and money throughout the software industry. One approach that is well-suited for the reactive systems found in telephone switching systems is specification-based testing. We have built a set of tools to automatically test software applications for biolations of safety properties expressed in temporal logic. out testing system automatically constructs finite state machine oracles corresponding to safety properties, builds test harnesses, and integrates them with the application. The test harness hen generates inputs automatically to test the application. We describe a study examining the feasibility of this approach for testing industrial applications. To conduct this study we formally modeled an Automatic Protection Switching system (APS), which is an application common to many telephony systems. We then asked a number of computer science graduate students to develop several versions of the APS and use our tools to test them. We found that the tools are very effective, save significant amounts of human effort (at the expense of machine resources), and are easy to use. We also discuss improvements that are needed before we can use the tools with professional developer building commercial products. (Also cross-referenced as UMIACS-TR-97-18) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Bell Laboratories, Naperville IL, Dept. of Computer Sciences, Univ. of Texas at Austin, AT&T Laboratories,
Anywhere, Anytime Code Inspections: Using the Web to Remove Inspection. James Perpich,. Dewayne E. Perry. Adam A. Porter. Lawrence G. Votta. Michael W. Wade. March 1997.
The dissemination of critical information and the synchronization of coordinated activities are critical problems in geographically separated, large-scale, software development. While these problems are not insurmountable, their solutions have varying trade-offs in terms of time, cost and effectiveness. Out previous studies have shown that the inspection interval is typically lengthened because of schedule conflicts among inspectors which delay the (usually) required inspection collection meeting. We present and justify a solution using an intranet web that is both timely in its dissemination of information and effective in its coordination of distributed inspectors. First, exploiting a naturally occurring experiment (reported here), we conclude that the asynchronous collection of inspection results is at least as effective as the synchronous collection of those results. Second, exploiting the information dissemination qualities and the on-demand nature of information retrieval of the web, and the platform independence of browsers, e build an inexpensive tool that integrates seamlessly into the current development process. By seamless we man an identical paper flow that results in an almost identical inspection process. The acceptance of the inspection tool has been excellent. The cost savings just from the reduction in paper work and the time savings from the reduction in distribution interval of the inspection package (sometimes involving international mailings) have been substantial. These savings together with the seamless integration into the existing environment are the major factors for this acceptance. From our viewpoint as experimentalists, the acceptance came too readily. Therefor we lost our opportunity to explore this tool using a series of controlled experiments to isolate the underlying factors or its effectiveness. Nevertheless, by using historical data we can show that the new process is less expensive in terms of cost and at least as effective in terms of quality (defect detection effectiveness). (Also cross-referenced as UMIACS-TR-97-17) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Lucent Technologys Inc, Naperville IL and Murray Hill NJ, Bell Laboratories, Murray Hill NJ,
Specification-based Testing of Reactive Software: A Case Study in. Lalita Jangadeesan. Adam A. Porter. Carlos Puchol. J. Christopher Ramming. Lawrence G. Votta. February 1997.
We describe a case study in which we tried to transfer a specification-based testing system from research to practice. We did the case study in two steps: First we conducted a feasibility study in a laboratory setting to estimate the potential costs and benefits of using the system. Next we conducted a usability study, in an industrial setting, to determine whether it would be effective in practice. The case study illustrates that technology transfer efforts can benefit from a greater focus on practitioners' needs, and that this focus helps identify some of the open problems that limit formal methods technology transfer. We also found that there is often a tension between the scope of the problem to be solved and the specificity of the solution. The greater the scope of the problem, the more general the formal method solution and, thus, the more customization that must be done to use it in a particular environment. We suggest that researchers limit the scope of the problems they try to solve to minimize the risk of technology transfer failure. (Also cross-referenced as UMIACS-TR-97-16) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Assessing Software Review Meetings: Results of a Comparative Analysis. Adam A. Porter. Philip M. Johnson. February 1997.
Software review is a fundamental tool for software quality assurance. Nevertheless, there are significant controversies as to the most efficient and effective review method. One of the most important questions currently being debated is the utility of meetings. Although almost all industrial review methods are centered around the inspection meeting, recent findings call their value into question. To gain insight into these issues, the two authors of this paper separately and independently conducted controlled experimental studies. This paper discusses a joint effort to understand the broader implications of these tow studies. To do this, we designed and carried out a process of "reconciliation" in which we established a common framework for the comparison of the two experimental studies, re-analyzed to experimental data with respect to this common framework, and compared the results. Through this process we found many striking similarities between the the results of the two studies, strengthening their individual conclusions. it also revealed interesting differences between the two experiments, suggesting important avenues for future research. (Also cross-referenced as UMIACS-TR-97-15) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Quantifiable Data Mining Using Principal Component Analysis. Flip Korn. Alexandros Labrinidis. Yannis Kotidis. Christos Faloutsos. Alex Kaplunovich. Dejan Perkovic. February 1997.
Association Rule Mining algorithms operate on a data matrix (e.g., customers x products) to derive rules. We propose a single-pass algorithm for mining linear rules in such a matrix based on Principal Component Analysis. PCA detects correlated columns of the matrix, which correspond to, e.g., products that sell together. The first contribution of this work is that we propose to quantify the ``goodness'' of a set of discovered rules. We define the ``guessing error'': the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. The second contribution is a novel method to guess missing/hidden values from the linear rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can ``guess'' the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, `what-if' scenarios, outlier detection, and visualization. Moreover, we show that we can compute the principal components with a single pass over the dataset. Experiments on real datasets (e.g., NBA statistics) demonstrate that the proposed method consistently achieves a ``guessing error'' of up to 5 times lower than the straightforward competitor. (Also cross-referenced as UMIACS-TR-97-13) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Temporally Determinate Disk Access: An Experimental Approach. Mohamed Aboutabl. Ashok K. Agrawala. Jean-Dominique Decotignie. February 1997.
Disk drives are the most commonly used secondary storage devices in computer systems. The way operating systems access these devices leads to a wide range of variability in access time. In this report we study the detailed temporal characteristics of disk drives. We describe a comprehensive set of experiments designed to build a model for the disk drive. Simulation is used to validate the model. This disk model will help design a device driver which can achieve a high degree of temporal determinacy. (Also cross-referenced as UMIACS-TR-97-14) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, D'epartment d'Informatique, Ecole Polytechnique Federale De Lausanne,
Representing and Integrating Multiple Calendars. Sarit Kraus. Yehoshua Sagiv. V. S. Subrahmanian. February 1997.
Whenever humans refer to time, they do so with respect to a specific underlying calendar. So do most software applications. However, most theoretical models of time refer to time with respect to the integers (or reals). Thus, there is a mismatch between the theory and the application of temporal reasoning. To lessen this gap, we propose a formal, theoretical definition of a calendar and show how one may specify dates, time points, time intervals, as well as sets of time points, in terms of constraints with respect to a given calendar. Furthermore, when multiple applications using different calendars wish to work together, there is a need to integrate those calendars together into a single, unified calendar. We show how this can be done. (Also cross-referenced as UMIACS-TR-97-12) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Mathematics and Computer Science, Bar-Ilan University, Israel, Dept. of Computer Science, Hebrew University, Israel,
ForMAT and Parka: A technology integration experiment and beyond. James Hendler. K. Stoffel. D. Rager. A. Mulvehill. February 1997.
This report describes a Technology Integration Experiment (TIE) between the University of Maryland and MITRE Corp. which was undertaken as part of the (D)Arpa/Rome Laboratory Planning Initiative (ARPI). This work led to an integration of the UM Parka-DB tool into the MITRE ForMAT transportation planning tool. This work also forms one of the cornerstones of the "Case-based Planning" cluster of the current phase of the ARPI. Dept. of Computer Science, Univ. of Maryland,
Multi-platform Simulation of Video Playout Performance. Ladan Gharai. Richard Gerber. February 1997.
We describe a video playout and simulation package, including (1) a multi-threaded player, which maximizes performance via asynchronous streaming and selective IO-prefetching; (2) a compositional simulator, which predicts playout performance for multiple platforms via eleven key deterministic and stochastic time-generating functions; and (3) a set of profiling tools, which allows one to extend the rang of target platforms by benchmarking new components, and converting the results into distribution functions that the simulator can access. Using this system, a developer can quickly estimate a video's performance on a wide spectrum of target platforms - without ever having to actually assemble them. (Also cross-referenced as UMIACS-TR-97-11) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Automated Computation of Decomposable Synchronization Conditions. Gilberto Matos. James M. Purtilo. Elizabeth White. February 1997.
The most important aspect of concurrent and distributed computation is the interaction between system components. Integration of components into a system requires some synchronization that prevents the components from interacting in ways that may endanger the system users, its correctness or performance. The undesirable interactions are usually described using temporal logic, or safety and liveness assertions. Automated synthesis of synchronization conditions is a portable alternative to the manual design of system synchronization, and it is already widespread in the hardware CAD domain. The automated synchronization for concurrent software systems is hindered by their excessive complexity, because their state spaces can rarely be exhaustively analyzed to compute the synchronization conditions. The analysis of global state spaces is required for liveness and real--time properties, but simple safety rules depend only on the referenced components and not on the rest of the system or its environment. Synchronization conditions for delayable safety critical systems can be computed without the state space analysis, and decomposed into single component synchronization conditions. Automated synthesis of decomposable synchronization conditions provides a solid groundwork for the independent design of system components, and supports reuse and maintenance in concurrent software systems. This approach to integration of concurrent systems is embodied by GenEx, an analysis and synchronization tool that integrates system components to satisfy a given set of safety rules, and produces executable systems. (Also cross-referenced as UMIACS-TR-97-10) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Computer Science, George Mason University,
LEXICALL: Lexicon Construction for Foreign Language Tutoring. Bonnie J. Dorr. February 1997.
We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the definition of a relation between broad semantic classes and LCS meaning components. Our acquisition program---LEXICALL---takes, as input, the result of previous work on verb classification and thematic grid tagging, and outputs LCS representations for different languages. These representations have been ported into English, Arabic and Spanish lexicons, each containing approximately 9000 verbs. We are currently using these lexicons in an operational foreign language tutoring and machine translation. (Also cross-referenced as UMIACS-TR-97-09) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Resource Lock Commit Protocol (RLCP) for Multimedia Object Retrieval. K. Selcuk Candan. Eenjun Hwang. B. Prabhakaran. V.S. Subrahmanian. February 1997.
Many multimedia presentation applications involve retrieval of objects from more than one collaborating server. Presentations of objects from different collaborating servers might be inter-dependent. For instance, we can consider distributed video servers where blocks of movies are distributed over a set of servers. Here, blocks of a movie from different video servers have to be retrieved and presented continuously without any gaps in the presentation. Such applications first need an estimate of the available network resources to each of the collaborating server in order to identify a schedule for retrieving the objects composing the presentation. A collaborating server can suggest modifications of the retrieval schedule depending on its load. These modifications can potentially affect the retrieval schedule for other collaborating applications. Hence, a sequence of negotiations have to be carried out with the collaborating servers in order to commit for a retrieval schedule of the objects composing the multimedia presentation. In this paper, we propose an application sub-layer protocol, Resource Lock Commit Protocol (RLCP), for handling the negotiation and commitment of the resources required for a collaborative multimedia presentation application. (Also cross-referenced as UMIACS-TR-97-08) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Flexible Meta-Wrapper Interface for Autonomous Distributed. Louiqa Raschid. Maria Esther Vidal. Jean-Robert Gruser. March 1997.
We support flexible query processing with autonomous networked information sources. Flexibility allows a query to be accepted in a dynamic environment with unavailable sources. Flexibility provides the ability to identify equivalent sources, based on their contents; these equivalences are used to eliminate redundancy and provide alternate query plans, when some source is unavailable. We determine the best plan, i.e., the least-cost non-redundant plan, based on a cost-model for autonomous sources. These features are supported by a meta-wrapper component within the mediator. The meta-wrapper interface is defined by a structure and supported operations. WHOQL is a query language for queries and plans; it can represent sequential execution to obtain safe plans, and plans with redundancy (alternatives). A language WHODL defines the mapping from the meta-wrapper interface to each source. WHODL also describes the contents of a source. This content definition is used to determine equivalences of autonomous sources. We obtain a least-cost non-redundant plan in a dynamic environment. A meta-wrapper cost model uses three underlying sources of information: a selectivity model; a cost model for operators in the meta-wrapper; and a cost estimator for the query response time. The estimator uses a parameterized feedback technique to learn from query feedback, and to determine the relevance of various factors that affect response time. The cost model also provides feedback to the plan generator on low-cost plans. (Also cross-referenced as UMIACS-TR-97-07) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compile-Time Analysis on Programs with Dynamic Pointer-Linked Data. Yuan-Shin Hwang. Joel Saltz. November 1996.
This paper studies static analysis on programs that create and traverse dynamic pointer-linked data structures. It introduces a new type of auxiliary structures, called {\em link graphs}, to depict the alias information of pointers and connection relationships of dynamic pointer-linked data structures. The link graphs can be used by compilers to detect side effects, to identify the patterns of traversal, and to gather the DEF-USE information of dynamic pointer-linked data structures. The results of the above compile-time analysis are essential for parallelization and optimizations on communication and synchronization overheads. Algorithms that perform compile-time analysis on side effects and DEF-USE information using link graphs will be proposed. Dept. of Computer Science, Univ. of Maryland,
Real-time Communication. Ardas Cilingiroglu. Sung Lee. Ashok K. Agrawala. January 1997.
Recent advances in networking technology has enabled new multimedia and process control applications. These applications require real-time communication services with stringent performance guarantees expressed in terms of delay, delay jitter, throughput and loss rate. Current network architectures and protocols are designed to support best-effort services and they are inefficient in supporting real-time services. In this paper, we survey real-time communication architectures and protocols both in packet-switching networks and in multiple-access networks. For each network a service model is presented as a general framework. Specifically, the service model for a packet-switching network is composed of a specification for traffic characterization and performance requirements, a routing protocol, a resource reservation protocol and a packet service discipline at switching nodes. The model for a multiple-access network, on the other hand, includes a basic traffic characterization and a MAC-layer real-time scheduling algorithm. This paper surveys the recent developments in each component of the service models with comparisons of alternative techniques. (Also cross-referenced as UMIACS-TR-97-04) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Alternative Discrete-Time Operators and Their Application to Nonlinear. Andrew D. Back. Ah Chung Tsoi. Bill G. Horne. C. Lee Giles. January 1997.
The shift operator, defined as q x(t) = x(t+1), is the basis for almost all discrete-time models. It has been shown however, that linear models based on the shift operator suffer problems when used to model lightly-damped-low-frequency (LDLF) systems, with poles near $(1,0)$ on the unit circle in the complex plane. This problem occurs under fast sampling conditions. As the sampling rate increases, coefficient sensitivity and round-off noise become a problem as the difference between successive sampled inputs becomes smaller and smaller. The resulting coefficients of the model approach the coefficients obtained in a binomial expansion, regardless of the underlying continuous-time system. This implies that for a given finite wordlength, severe inaccuracies may result. Wordlengths for the coefficients may also need to be made longer to accommodate models which have low frequency characteristics, corresponding to poles in the neighbourhood of (1,0). These problems also arise in neural network models which comprise of linear parts and nonlinear neural activation functions. Various alternative discrete-time operators can be introduced which offer numerical computational advantages over the conventional shift operator. The alternative discrete-time operators have been proposed independently of each other in the fields of digital filtering, adaptive control and neural networks. These include the delta, rho, gamma and bilinear operators. In this paper we first review these operators and examine some of their properties. An analysis of the TDNN and FIR MLP network structures is given which shows their susceptibility to parameter sensitivity problems. Subsequently, it is shown that models may be formulated using alternative discrete-time operators which have low sensitivity properties. Consideration is given to the problem of finding parameters for stable alternative discrete-time operators. A learning algorithm which adapts the alternative discrete-time operators parameters on-line is presented for MLP neural network models based on alternative discrete-time operators. It is shown that neural network models which use these alternative discrete-time perform better than those using the shift operator alone. (Also cross-referenced as UMIACS-TR-97-03) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Laboratory for Artificial Brain Systems, Institute of Physical and, Faculty of Informatics, University of Wollongong, Australia, AADM Consulting, Califon, NJ, NEC Research Institute, Princeton, NJ,
April 1997.
Iteration Space Slicing and Its Application to Communication. William Pugh. Evan Rosser. Program slicing is an analysis that answers questions such as ``Which statements might affect the computation of variable $v$ at statement $s$?'' or ``Which statements depend on the value of $v$ computed in statement $s$?''. The answers computed by program slicing are generally a set of statements. We introduce the idea of {\em iteration spacing slicing}: we refine program slicing to ask questions such as ``Which iterations of which statements might effect the computation in iterations $I$ of statement $s$?'' or ``Which iterations of which statements depend on the value computed by iterations $I$ of statement $s$?''. One application of this general-purpose technique is optimization of interprocessor communication in data-parallel compilers. For example, we can separate a code fragment into 1) those iterations that must be done before a send, 2) those iterations that don't need to be done before a send and don't depend on non-local data and 3), those iterations that depend on non-local data. We examine applications of iteration space slicing to communication optimizations in parallel executions of programs such as stencil computations and block-cyclic Gaussian elimination with partial pivoting. (Also cross-referenced as UMIACS-TR-97-02) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Study of Internet Round-Trip Delay. Anurag Acharya. Joel Saltz. December 1996.
We present the results of a study of Internet round-trip delay. The links chosen include links to frequently accessed commercial hosts as well as well-known academic and foreign hosts. Each link was studied for a 48-hour period. We attempt to answer the following questions: (1) how rapidly and in what manner does the delay change -- in this study, we focus on medium-grain (seconds/minutes) and coarse-grain time-scales (tens of minutes/hours); (2) what does the frequency distribution of delay look like and how rapidly does it change; (3) what is a good metric to characterize the delay for the purpose of adaptation. Our conclusions are: (a) there is large temporal and spatial variation in round-trip time (RTT); (b) RTT distribution is usually unimodal and asymmetric and has a long tail on the right hand side; (c) RTT observations in most time periods are tightly clustered around the mode; (d) the mode is a good characteristic value for RTT distributions; (e) RTT distributions change slowly; (f) persistent changes in RTT occur slowly, sharp changes are undone very shortly; (g) jitter in RTT observations is small and (h) inherent RTT occurs frequently. (Also cross-referenced as UMIACS-TR-96-97) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Reducing Router-Crossings in a Mobile Intranet. Rohit Dube. Ibrahim Korpeoglu. Satish K. Tripathi. January 1997.
Current general purpose mobility solutions like Mobile-IP involve multiple router-crossings even when the mobile host moves within an intranet from one subnet of a router to another. An environment consisting of a large number of mobile hosts would congest the router causing hosts to experience high latency and jitter. This paper presents a mechanism to eliminate multiple router-crossings in a mobile intranet, which reduces the load on the routers and the hand-off and data latency at the mobile hosts. (Also cross-referenced as UMIACS-TR-97-01) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Incremental Data Structures and Algorithms for Dynamic Query. Egemen Tanin. Richard Beigel. Ben Shneiderman. January 1997.
Dynamic query interfaces (DQIs) are a recently developed form of database access that provides continuous realtime feedback to the user during the query formulation process. Previous work shows that DQIs are an elegant and powerful interface to small databases. Unfortunately, when applied to large databases, previous DQI algorithms slow to a crawl. We present a new approach to DQI algorithms that works well with large databases. HCIL, Dept. of Computer Science, Univ. of Maryland,
A Prototype for a Distributed Space Physics Data System. Charles Falkenberg. Chuck Goodrich. James Gallagher. Peter Cornillon. Glenn Flierl. December 1996.
The collaborative analysis of data within the Space Physics community is hindered, in part, by the wide number of data formats and the wide distribution of data archives. In an attempt to address these two problems we have implemented a prototype which retrieves datasets, stored in different data formats at several remote locations. Our prototype uses the Key Parameter Visualization Tools (KPVT) and the Distributed Oceanographic Data System (DODS) to view data from the ISEE1, ISEE2, and ISTP programs. Our goal is to demonstrate the ability to access and use several types of remote data and existing analysis tools. The work described demonstrates the power of an expressive data model, like the one in DODS, for converting and transmitting space physics data. Furthermore, since the DODS system architecture (and associated data model) was developed to meet oceanographic needs, the fact that it works well for use within the space physics community suggests that the DODS approach will also work well as a data distribution mechanism for the other earth science sub-disciplines. Given the growing interest in interdisciplinary work in the earth sciences the existence of a data model/system capable of spanning the various sub-disciplines is significant. Dept. of Computer Science, Univ. of Maryland, Advanced Visualization Laboratory, Univ. of Maryland, Graduate School of Oceanography, Univ. of Rhode Island, Massachusetts Institute of Technology,
Toward Optimizing Distributed Programs Directed by Configurations. Tae-Hyung Kim. December 1996.
Networks of workstations are now viable environments for running distributed and parallel applications. Recent advances in software interconnection technology enables programmers to prepare applications to run in dynamically changing environments because module interconnection activity is regarded as an essentially distinct and different intellectual activity so as isolated from that of implementing individual modules. But there remains the question of how to optimize the performance of those applications for a given execution environment: how can developers realize performance gains without paying a high programming cost to specialize their application for the target environment? Interconnection technology has allowed programmers to tailor and tune their applications on distributed environments, but the traditional approach to this process has ignored the performance issue over gracefully seemless integration of various software components. Networks of workstations can be virtual parallel machines. For a distributed and parallel application on such environments, an ability to write performance-literate programs is as important as that to seemlessly integrate distributed modules. Our dissertation research is an effort to extend the plain interconnection technology to that with a variety of performance attributes. The RPC (remote procedure call) paradigm is used at the module programming level because it adopts a widely used and understood procedure call abstraction as the sole mechanism of remote operations and thus helps to shape reusable components. Most of performance related decisions are pertinent to the interconnections among software components. Our effort toward performance tuning consists of two main thrusts. One is an automatic adaptation from a performance configuration, which is analogous to the process of software interconnection for traditional structure-oriented configurations. We present how a performance configuration can be represented as an extension to traditional module interconnections. The other is an optimal transformation for RPC statements in an individual module using various program analysis techniques. Conventional stub generation based approach to implement RPC paradigm cannot serve for performance improvement because of its synchronous property. In concert with the two systematic approaches toward optimizing distributed programs, programmers can have high performance and conceptual simplicity in writing distributed programs. Dept. of Computer Science, Univ. of Maryland,
Iterative Solution of the Helmholtz Equation By a Second-Order Method. Kurt Otto. Elisabeth Larsson. December 1996.
The numerical solution of the Helmholtz equation subject to nonlocal radiation boundary conditions is studied. The specific problem is discretized with a second-order accurate finite-difference method, resulting in a linear system of equations. To solve the system of equations, a preconditioned Krylov subspace method is employed. The preconditioner is based on fast transforms, and yields a direct fast Helmholtz solver for rectangulay domains. Numerical experiments for curved ducts demonstrate that the rate of convergence is high. Compared with band Gaussian elimination the preconditioned iterative method shows a significant gain in both storage requirement and arithmetic complexity. (Also cross-referenced as UMIACS-TR-96-95) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Scientific Computing, Uppsala Univ., Uppsala Sweden,
Organizational Issues in Software Development:An Empirical Study of. Carolyn B. Seaman. December 1996.
The subject of this dissertation is an empirical study whose goal is to characterize certain aspects of communication among members of a software development organization. The independent variables in this study are various attributes of organizational structure. The dependent variable is the effort spent on sharing information which is required by the code inspection process in use. The research questions upon which the study is based ask whether or not these attributes of organizational structure have an effect on the amount of communication effort expended. In addition, several other variables have been included, such as code size and complexity, which represent factors other than organizational structure which may have an effect on communication effort. The study uses both quantitative and qualitative methods for data collection and analysis. These methods include participant observation, structured interviews, graphical data presentation, and interpretation of statistical results with qualitative anecdotes. In addition, a pilot study was conducted to test this combination of methods. The findings, which are presented as a set of hypotheses, show that all of the organizational structure characteristics studied do have an effect on communication effort, at least in some circumstances. The work described in this dissertation helps to enable a whole new area of research, by illustrating one effective way of conducting such investigations, and by providing some hypotheses with which to begin. (Also cross-referenced as UMIACS-TR-96-94) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Optimization within a Unified Transformation Framework. Wayne Kelly. August 1996.
Programmers typically want to write scientific programs in a high level language with semantics based on a sequential execution model. To execute efficiently on a parallel machine, however, a program typically needs to contain explicit parallelism and possibly explicit communication and synchronization. So, we need compilers to convert programs from the first of these forms to the second. There are two basic choices to be made when parallelizing a program. First, the computations of the program need to be distributed amongst the set of available processors. Second, the computations on each processor need to be ordered. My contribution has been the development of simple mathematical abstractions for representing these choices and the development of new algorithms for making these choices. I have developed a new framework that achieves good performance by minimizing communication between processors, minimizing the time processors spend waiting for messages from other processors, and ordering data accesses so as to exploit the memory hierarchy. This framework can be used by optimizing compilers, as well as by interactive transformation tools. The state of the art for vectorizing compilers is already quite good, but much work remains to bring parallelizing compilers up to the same standard. The main contribution of my work can be summarized as improving this situation by replacing existing ad hoc parallelization techniques with a sound underlying foundation on which future work can be built. (Also cross-referenced as UMIACS-TR-96-93) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in. Tamara G. Kolda. Dianne P. O'Leary. December 1996.
The vast amount of textual information available today is useless unless it can be effectively and efficiently searched. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is usually accomplished using a low-rank singular value decomposition (SVD) approximation. In this paper, we use an alternate decomposition, the semi-discrete decomposition (SDD). In our tests, for equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage. Additionally, we show how to update the SDD for a dynamically changing document collection. (Also cross-referenced as UMIACS-TR-96-92) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Presentation Planning for Distributed Video Systems. Eenjun Hwang. B. Prabhakaran. V.S. Subrahmanian. December 1996.
A distributed video-on-demand system is one where a collection of video data is located at dispersed sites across a computer network. In a single-site environment, a local video server retrieves video data from its local storage device (or devices). However, in the setting of a distributed VoD system, when a customer requests a movie from his/her local server, the server may need to interact with other servers located across the network. In this paper, we present three types of presentation plans, that a local server must construct in order to satisfy the customer's request. Informally speaking, a presentation plan is a detailed (temporally synchronized) sequence of steps that the host server must perform at given points in time. This involves obtaining committments from other video servers, obtaining committments from the network service provider, as well as making committments of local resources, within the limitations of available bandwidth, available buffer, and customer/client data consumption rates. The three types of plans described in this paper all work at different "levels of abstraction" in this planning process. Furthermore, we introduce two measures of how good a plan is: minimizing wait time for the customer, and minimizing a quantity called access bandwidth (which informally speaking, specifies how much network/disk bandwidth is used). We develop algorithms to compute optimal (w.r.t. the above measures) plans for all three types, and show experimentally that in all three cases, one of the three types of plans (called a hybrid presentation plan) systematically outperforms the other two. In addition to these new concepts, our framework has the advantage that many results that had previously been verified experimentally in the literature can now be conclusively proved mathematically. (Also cross-referenced as UMIACS-TR-96-91) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Software Engineering of Virtual Environments:. Donald J. Welch. James M. Purtilo. July 1996.
Virtual Environments(VEs) are proving to be valuable resources in many fields, and they are even more useful when they involve multiple users in distributed environments. Many useful VEs were designed to be stand-alone applications, without consideration for integrating them into a distributed VE. Our approach to connecting VEs is to define an abstract model for the interconnection, use integration tools to do as much of the work automatically as possible, and use a run-time environment to support the interconnection. With our experiences to date, we are learning that certain classes of techniques are common to all solutions using this approach. We have summarized these in a set of requirements and are building a system that features these techniques as first class objects. In the future you will be able to solve these interconnection problems cheaply, plus engineers of future VEs will have some guidance on how they should organize their implementations so that interconnection with other VEs will be easier. In this paper we coin the phrase "software engineering of virtual environments" (SEVE) to describe the above activities. (Also cross-referenced as UMIACS-TR-96-89) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Interconnecting Distributed Legacy Systems: Virtual Environment. Donald J. Welch. James M. Purtilo. October 1996.
As the power and utility of virtual reality environments increases, so do the potential benefits found from combinding several such environments. But doing so presents the developer with a host of difficult distributed systems issues. This paper explores what some of these issues are within the VE domain, relates our successes to date in overcoming these problems by means of various automated tools, and suggests ways to apply our results other target domains. (Also cross-referenced as UMIACS-TR-96-88) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Zubin: A Software Engineering Environment for Interconnecting Legacy. Donald J. Welch. James M. Purtilo. November 1996.
As the power and utility of virtual reality environments increases, so do the potential benefits found from combining several such environments. But doing so presents the developer with a host of difficult software engineering issues. This paper explores what some of these issues are within the VE domain, relates our successes to date in overcoming these problems by means of various automated tools, and suggests ways to apply our results other target domains. (Also cross-referenced as UMIACS-TR-96-87) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Previews and Overviews in Digital Libraries: Designing Surrogates to. Hanan Samet. October 1997.
To aid designers of digital library interfaces and web sites in creating comprehensible, predictable and controllable environments for their users, we define and discuss the benefits of previews and overviews as visual information representations. Previews and overviews are graphic or textual representations of information abstracted from primary information objects. They serve as surrogates for those objects. When utilized properly, previews and overviews allow users to rapidly discriminate objects of interest from those not of interest, and to more fully understand the scope and nature of large collections of information resources. We provide a more complete definition of previews and overviews, and discuss system parameters and aspects of primary information objects relevant to designing effective preview and overviews. Finally, we present examples that illustrate the use of previews and overviews and offer suggestions for designers. Department of Computer Science, University of Maryland,
Self-Replicating Structures in a Cellular Automata Space. Hui-Hsien Chou. July 1996.
Biological experience and intuition suggest that self-replication is an inherently complex phenomenon, and early cellular automata self-replication models developed by computer scientists and mathematicians supported that view. However, since von~Neumann's original work in the 1950's, the study of cellular automata models of self-replicating systems has progressively led to smaller and simpler systems. This thesis demonstrates for the first time that it is possible to create automatically self-replicating structures in cellular automata models rather than, as has been done in the past, to design them manually. These emergent self-replicating structures employ a General Purpose Self-Replicating cellular automata rule set which can support the replication of structures of different sizes and their growth from smaller to larger ones. This thesis also demonstrates that, by letting self-replicating structures carry additional information besides replication instructions, they can be used to solve computationally hard problems such as the Satisfiability (SAT) problem. It is shown that self-replicating structures can be made to carry characteristic codes and selection forces can be implemented in cellular automata space. This study opens the door to further studies that could lead to general, solution-evolvable structures and truly self-programming systems. (Also cross-referenced as UMIACS-TR-96-85) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Query Scrambling for Bursty Data Arrival.. Laurent Amsaleg. Michael J. Franklin. A. Tomasic. November 1996.
Distributed databases operating over wide-area networks, such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources may vary widely due to network congestion, link failure, and other problems. In this paper we examine a new class of methods, called query scrambling, for dealing with unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. We explore various choices in the implementation of these methods and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and it focuses on bursty arrivals of data. We identify and study a number of the basic trade-offs that arise when designing scrambling policies for the bursty environment. Our performance results show that query scrambling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS-TR-96-84) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Large Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition. Tamara G. Kolda. Dianne P. O'Leary. November 1996.
With the electronic storage of documents comes the possibility of building search engines that can automatically choose documents relevant to a given set of topics. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. There are a number of information retrieval systems based on inexact matches. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is usually accomplished using a low-rank singular value decomposition (SVD) approximation. In this paper, we use an alternate decomposition, the semi-discrete decomposition (SDD). For equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage for the MEDLINE test set. (Also cross-referenced as UMIACS-TR-96-83) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Preconditioning for the Steady-State Navier-Stokes Equations with Low. Howard C. Elman. November 1996.
We introduce a preconditioner for the linearized Navier-Stokes equations that is effective when either the discretization mesh size or the viscosity approaches zero. For constant coefficient problems with periodic boundary conditions, we show that the preconditioning yields a system with a single eigenvalue equal to one, so that performance is independent of both viscosity and mesh size. For other boundary conditions, we demonstrate empirically that convergence depends only mildly on these parameters and we give a partial analysis of this phenomenon. We also show that some expensive subsidiary computations required by the new method can be replaced by inexpensive approximate versions of these tasks based on iteration, with virtually no degradation of performance. (Also cross-referenced as UMIACS-TR-96-82) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiler-directed Dynamic Linking for Mobile Programs. Anurag Acharya. Joel Saltz. November 1996.
In this paper, we present a compiler-directed technique for safe dynamic linking for mobile programs. Our technique guarantees that linking failures can occur only when a program arrives at a new execution site and that this failure can be delivered to the program as an error code or an exception. We use interprocedural analysis to identify the set of names that must be linked at the different sites the program executes on. We use a combination of runtime and compile-time techniques to identify the calling context and to link only the names needed in that context. Our technique is able to handle recursive programs as well as separately compiled code that may itself be able to move. We discuss language constructs for controlling the behavior of dynamic linking and the implication of some of these constructs for application structure. (Also cross-referenced as UMIACS-TR-96-81) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Utility of Exploiting Idle Workstations for Parallel Computation. Anurag Acharya. Guy Edjlali. Joel Saltz. November 1996.
In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of $k$ workstations available? This provides an estimate of the opportunity for parallel computation. Second, how stable is a cluster of free machines and how does the stability vary with the size of the cluster? This indicates how frequently a parallel computation might have to stop for adapting to changes in processor availability. Third, what is the distribution of workstation idle-times? This information is useful for selecting workstations to place computation on. Fourth, how much benefit can a user expect? To state this in concrete terms, if I have a pool of size S, how big a parallel machine should I expect to get for free by harvesting idle machines. Finally, how much benefit can be achieved on a real machine and how hard does a parallel programmer have to work to make this happen? To answer the workstation-availability questions, we have analyzed 14-day traces from three workstation pools. To determine the equivalent parallel machine, we have simulated the execution of a group of well-known parallel programs on these workstation pools. To gain an understanding of the practical problems, we have developed the system support required for adaptive parallel programs as well as an adaptive parallel CFD application. (Also cross-referenced as UMIACS-TR-96-80) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On the Weighting Method for Least Squares Problems with Linear. G. W. Stewart. November 1996.
The weighting method for solving a least squares problem with linear equality constraints multiplies the constraints by a large number and appends them to the top of the least squares problem, which is then solved by standard techniques. In this paper we give a new analysis of the method, based on the QR~decomposition, that exhibits many features of the algorithm. In particular it suggests a natural criterion for chosing the weighting factor. (Also cross-referenced as UMIACS-TR-96-79) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
LARGE-SCALE OPTIMIZATION OF NEURON ARBORS. Christopher Cherniak. Mark Changizi. Du Won Kang. November 1996.
At the global as well as local scale, some of the geometry of types of neuron arbors--both dendrites and axons--appears to be self- organizing: Their morphogenesis behaves like flowing water, that is, fluid-dynamically; waterflow in branching networks in turn acts like a tree composed of cords under tension, that is, vector-mechanically. The result is that such neuron trees globally minimize their total volume--rather than, for example, surface area or branch-length--to about 5% of optimum for interconnecting their terminals. These kinds of arbors similarly perform well at generating the cheapest topology connecting their terminals: their large-scale layouts are among the top few of all such possible connecting-patterns. Also cross-referenced as UMIACS-TR-96-78 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Committee on History and Philosophy of Science, Department of,
A Delay Damage Model Selection Algorithm for NARX Neural Networks. Tsungnan Lin. C. Lee Giles. Bill G. Horne. Sun-Yang Kung. December 1996.
Recurrent neural networks have become popular models for system identification and time series prediction. NARX (Nonlinear AutoRegressive models with eXogenous inputs) neural network models are a popular subclass of recurrent networks and have beenused in many applications. Though embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that using intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction. (Also cross-referenced as UMIACS-TR-96-77) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, NEC Research Institute, Dept. of Electrical Engineering, Princeton University,
Grindstone: A Test Suite for Parallel Performance Tools. Jeffrey K. Hollingsworth. Michael Steele. October 1996.
We describe Grindstone, a suite of programs for testing and calibrating parallel performance measurement tools. The suite consists of nine simple SPMD style PVM programs that demonstrate common communication and computational bottlenecks that occur in parallel programs. In addition, we provide a short case study that demonstrates the use of the test suite on three performance tools for PVM. The results of the case study showed that we were able to uncover bugs or other anomalies in all three tools. The paper also describes how to acquire, compile, and use the test suite. (Also cross-referenced as UMIACS-TR-96-73) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Intensional Query Optimization. Parke Godfrey. Jarek Gryz. September 1996.
We have introduced a new query optimization framework called intensional query optimization (IQO), which enables existing optimization techniques to be applied to queries that use views. In particular, we consider that view definitions may employ unions. Advanced database technologies and applications--such as federation and mediation over heterogeneous database sources--lead to such complex view definitions, and to the need to handle complex, expensive queries. Query rewriting techniques have been proposed which exploit semantic query caches, materialized views, and semantic knowledge about the database domain to optimize query evaluation. These can augment syntactic optimization to reduce evaluation costs further. Such techniques include semantic query caching, query folding, and semantic query optimization. However, most proposed rewrite techniques ignore views in queries; that is, the views are considered as other tables. The IQO framework enables rewrites to be applied to various expansions of the query, even when no such rewrite is applicable directly to the query itself. With IQO, we optimize the query tree, not just the query. The IQO framework introduces the notion of a discounted query, which is a query with some of its expansions "separated out", so the query can be recast into pieces that can be optimized. For this approach to be effective, the sum of the costs of evaluating each piece must be less than the cost of evaluating the query itself. This includes the discounted query. We develop an evaluation plan for discounted queries that is generally more efficient than the evaluation of the queries themselves. (Also cross-referenced as UMIACS-TR-96-72) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Stabbing Orthogonal Objects in 3-Space. David M. Mount. Fan-Tao Pu. October 1996.
We consider a problem that arises in the design of data structures for answering {\em visibility range queries}, that is, given a $3$-dimensional scene defined by a set of polygonal patches, we wish to preprocess the scene to answer queries involving the set of patches of the scene that are visible from a given range of points over a given range of viewing directions. These data structures recursively subdivide space into cells until some criterion is satisfied. One of the important problems that arise in the construction of such data structures is that of determining whether a cell represents a nonempty region of space, and more generally computing the size of a cell. In this paper we introduce a measure of the {\em size} of the subset of lines in 3-space that stab a given set of $n$ polygonal patches, based on the maximum angle and distance between any two lines in the set. Although the best known algorithm for computing this size measure runs in $O(n^2)$ time, we show that if the polygonal patches are orthogonal rectangles, then this measure can be approximated to within a constant factor in $O(n)$ time. (Also cross-referenced as UMIACS-TR-96-71) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Enhancing Software DSM for Compiler-Parallelized Applications. Pete Keleher. Chau-Wen Tseng. September 1996.
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed-shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: 1) combine synchronization and parallelism information communication on parallel task invocation, 2) employ customized routines for evaluating reduction operations, and 3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good speedups eight processors on an IBM SP-2 and DEC Alpha cluster, usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Based on our experimental results, we point out areas where additional compiler analysis and software DSM improvements can be used to achieve good performance on a broader range of applications. (Also cross-referenced as UMIACS-TR-96-70) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
NetDyn Revisited: A Replicated Study of Network Dynamics. Julie Pointek. Forrest Shull. Roseanne Tesoriero. Ashok K. Agrawala. October 1996.
In 1992 and 1993, a series of experiments using the NetDyn tool was run at the University of Maryland to characterize network behavior. These studies identified multiple design and implementation faults in the Internet. Since that time, there has been a wide array of changes to the Internet. During the Spring of 1996, we conducted a replication of the NetDyn experiments in order to characterize end-to-end behavior in the current environment. In this paper, we present and discuss the latest results obtained during this study. Although the network seems to be stabilizing with respect to transit times, our current results are similar to the results from past experiments. That is, networks often exhibit unexpected behavior. The data suggest that while there has been improvement, there are still problem areas that need to be addressed. (Also cross-referenced as UMIACS-TR-96-69) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Elastic Windows: Evaluation of Multi-Window Operations. Eser Kandogan. Ben Shneiderman. October 1996.
Most windowing systems follow the independent overlapping windows approach, which emerged as an answer to the needs of the 1980s' technology. Due to advances in computers and display technology, and increased information needs, modern users demand more functionality from window management systems. We proposed Elastic Windows with improved spatial layout and rapid multi-window operations as an alternative to current window management strategies for efficient personal role management [kandogan]. In this approach, multi-window operations are achieved by issuing operations on window groups hierarchically organized in a space-filling tiled layout. This paper describes the Elastic Windows interface briefly and then presents a study comparing user performance with Elastic Windows and traditional window management techniques for 2, 6, and 12 window situations. Elastic Windows users had statistically significantly faster performance for all 6 and 12 window situations, for task environment setup, task environment switching, and task execution. These results suggest promising possibilities for multiple window operations and hierarchical nesting, which can be applied to the next generation of tiled as well as overlapped window managers. Human-Computer Interaction Laboratory, Univ. of Maryland, Institute for Systems Research, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland,
Bringing Treasures to the Surface - Iterative Design for the Library of. Catherine Plaisant. Gary Marchionini. Tom Bruns. Anita Komlodi. Laura Campbell. October 1996.
The Human-Computer Interaction Lab worked with a team of the Library of Congress (LC) to develop and test interface designs for LCUs National Digital Library Program. Three iterations are described and illustrate the progression of the design toward a compact design that minimizes scrolling and jumping and anchors users in a screen space that tightly couples search and results. Issues and resolutions are discussed for each iteration and reflect the challenges of incomplete metadata, data visualization, and the rapidly changing web environment. Human-Computer Interaction Laboratory, Univ. of Maryland, Digital Library Research Group, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland, National Digital Library Program, Library of Congress, Washington DC,
Synthesizing Protocol Specifications from Service Specifications. Jun-Cheol Park. Raymond E. Miller. September 1996.
We propose a specification model and present a method to algorithmically derive a protocol specification from a service specification based on the model. Unlike the previous models based on finite state machines, the proposed model can explicitly express concurrency, synchronization, and timing requirements such as delays and timeouts. We assume that there exists a reliable communication channel between any two protocol entities and the maximum delay for each channel is bounded by a positive constant. Because of the variable nature of the communication delays along with the time constraints associated with events, no protocol specification can fully simulate the service specification. The proposed method derives a protocol specification that is optimal in the sense that it provides the largest possible subset of the service specification under the communication delay constraints. We also give a method to derive a sub specification from a service specification and a maximum communication delay of each channel such that the sub specification, but no superset of it, can be simulated by the derived protocol specification. Dept. of Computer Science, Univ. of Maryland,
September 1996.
Putting Visualization to Work -- ProgramFinder for Youth Placement. Jason Ellis. Anne Rose. Catherine Plaisant. The Human-Computer Interaction Laboratory (HCIL) and the Maryland Department of Juvenile Justice (DJJ) have been working together to develop the ProgramFinder, a tool for choosing programs for a troubled youth from drug rehabilitation cente rs to secure residential facilities. The seemingly straightforward journey of t he ProgramFinder from an existing user interface technique to a product design r equired the development of five different prototypes which involved user interfa ce design, prototype implementation, and selecting search criterion. While HCIL 's effort focused primarily on design and implementation, DJJ's attribute select ion process was the most time consuming and difficult task. We also found that a direct link to DJJ's workflow was needed in the prototypes to generate the nec essary "buy-in". This paper analyzes the interaction between the efforts of HCI L and DJJ and the amount of "buy-in" by DJJ staff and management. Lesson learne d are presented for developers. Human-Computer Interaction Laboratory, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Customizable Simulator for Workstation Networks. Mustafa Uysal. Anurag Acharya. Robert Bennett. Joel Saltz. September 1996.
We present a customizable simulator called netsim for high-performance point-to-point workstation networks that is accurate enough to be used for application-level performance analysis yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16-node IBM SP-2 with a multistage network and a 10-node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and a 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm. In addition, we show that the cross-traffic congestion for today's high-speed point-to-point networks has little, if any, effect on application-level performance and that modeling end-point congestion is sufficient for a reasonably accurate simulation. (Also cross-referenced as UMIACS-TR-96-68) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Titan A High-Performance Remote-Sensing Database. Chialin Chang. Bongki Moon. Anurag Acharya. Carter Shock. Alan Sussman. Joel Saltz. August 1996.
There are two major challenges for a high-performance remote-sensing database. First, it must provide low-latency retrieval of very large volumes of spatio-temporal data. This requires effective declustering and placement of a multi-dimensional dataset onto a large disk farm. Second, the order of magnitude reduction in data-size due to post-processing makes it imperative, from a performance perspective, that the postprocessing be done on the machine that holds the data. This requires careful coordination of computation and data retrieval. This paper describes the design, implementation and evaluation of {\em Titan}, a parallel shared-nothing database designed for handling remote-sensing data. The computational platform for Titan is a 16-processor IBM SP-2 with four fast disks attached to each processor. Titan is currently operational and contains about 24~GB of data from the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7 satellite. The experimental results show that Titan provides good performance for global queries, and interactive response times for local queries. (Also cross-referenced as UMIACS-TR-96-67) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Symbolic Model Checking of Infinite State Programs Using Presburger. Tevfik Bultan. Richard Gerber. William Pugh. September 1996.
Model checking is a powerful technique for analyzing large, finite-state systems. In an infinite transition system, however, many basic properties are undecidable. In this paper we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically encode a program's transition system, as well as its model-checking computations. All fixpoint calculations are executed symbolically, and their convergence is guaranteed by using approximation techniques. We demonstrate the promise of this technology on some well-known infinite-state concurrency problems. (Also cross-referenced as UMIACS-TR-96-66) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Regularization Algorithms Based on Total Least Squares. Per Christian Hansen. Dianne P. O'Leary. September 1996.
Discretizations of inverse problems lead to systems of linear equations with a highly ill-conditioned coefficient matrix, and in order to compute stable solutions to these systems it is necessary to apply regularization methods. Classical regularization methods, such as Tikhonov's method or truncated {\em SVD}, are not designed for problems in which both the coefficient matrix and the right-hand side are known only approximately. For this reason, we develop {\em TLS}\/-based regularization methods that take this situation into account. Here, we survey two different approaches to incorporation of regularization, or stabilization, into the {\em TLS} setting. The two methods are similar in spirit to Tikhonov regularization and truncated {\em SVD}, respectively. We analyze the regularizing properties of the methods and demonstrate by numerical examples that in certain cases with large perturbations, these new methods are able to yield more accurate regularized solutions than those produced by the standard methods. (Also cross-referenced as UMIACS-TR-96-65) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Deparment of Mathematical Modelling, Technical Univ. of Denmark,
Pivoted Cauchy-like Preconditioners for Regularized Solution of. Misha E. Kilmer. Dianne P. O'Leary. September 1996.
Many ill-posed problems are solved using a discretization that results in a least squares problem or a linear system involving a Toeplitz matrix. The exact solution to such problems is often hopelessly contaminated by noise, since the discretized problemis quite ill-conditioned, and noise components in the approximate null-space dominate the solution vector. Therefore we seek an approximate solution that does not have large components in these directions. We use a preconditioned conjugate gradient algorithm to compute such a regularized solution. An orthogonal change of coordinates transforms the Toeplitz matrix to a Cauchy-like matrix, and we choose our preconditioner to be a low rank Cauchy-like matrix determined in the course of Gu's fast modified complete pivoting algorithm. We show that if the kernel of the ill-posed problem is smooth, then this preconditioner has desirable properties: the largest singular values of the preconditioned matrix are clustered around one, the smallest singular values, corresponding to the noise subspace, remain small, and the signal and noise spaces are relatively unmixed. The preconditioned algorithm costs only $O(n \lg n)$ operations per iteration for a problem with $n$ variables. The effectiveness of the preconditioner for filtering noise is demonstrated on three examples. (Also cross-referenced as UMIACS-TR-96-63) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Applied Mathematics Program, Univ. of Maryland,
Updating Discourse Context with Active Logic. John Gurney. Khemdut Purang. Don Perlis. June 1996.
In this paper we present our implementation of a system of active logic that processes natural language discourses. We focus on problems that involve presupposition and the associated well-known problems of the projection of presupposition. We discuss Heim's largely successful theory of presupposition and point out certain limitations. We then use these observations to build our discourse processor based on active logic. Our main contributions are the handling of problems that go beyond the scope of Heim's theory , especially discourses the involve cancellation of presupposition. Ongoing work suggests that conversational implicature and the cancellation of implicature can also be treated by our methods. Key words: presupposition, discourse, con text, accommodation, active logic, implicature. (Also cross-referenced as UMIACS-TR-96-62) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Army Research Laboratory, Adelphi MD,
Defaults Denied. Michael Miller. Don Perlis. Khemdut Purang. June 1996.
We take a tour of various themes in default reasoning, examining new ideas as well as those of Brachman, Delgrande, Poole, and Schlechta. An underlying issue is that of stating that a potential default principle is not appropriate. We see this arise most dramatically as a problem in an attempt to formalize what are often loosely called "prototypes", although it also arises in other formal approaches to default reasoning. Some formalisms in the literature provide solutions but not without costs. We propose a formalism that appears to avoid these costs; it can be seen as a step toward a population-based set-theoretic modification of these approaches, that may ultimately provide a closer tie to recent work on statistical (quantitative) foundations of (qualitative) defaults([1]). Our analysis in particular indicates the need to resolve a conflation between use and mention in many default formalisms. Our treatment proposes such a resolution, and also explores the use of sets toward a more population-based notion of default. (Also cross-referenced as UMIACS-TR-96-61) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Intelligent Automation Inc., Rockville MD,
Automated Discovery of Self-Replicating Structures in Cellular Space. Jason D. Lohn. August 1996.
This thesis demonstrates for the first time that it is possible to automatically discover self-replicating structures in cellular space automata models rather than, as has been done in the past, to design them manually. Self-replication is defined as the process an entity undergoes in constructing a copy of itself. Von~Neumann was the first to investigate artificial self-replicating structures and did so in the context of cellular automata, a cellular space model consisting of numerous finite-state machines embedded in a regular tessellation. Interest in artificial self-replicating systems has increased in recent years due to potential applications in molecular-scale manufacturing, programming parallel computing systems, and digital hardware design, and also as part of the field of artificial life. In this dissertation, genetic algorithms are used with a cellular automata framework for the first time to automatically discover self-replicating structures. The discovered self-replicating structures compare favorably in terms of simplicity with those generated manually in the past but differ in unexpected ways. This dissertation presents representative samples of the self-replicating structures and analyzes them both quantitatively and qualitatively. In order to effectively search the underlying rule space of such automata models, a fitness function consisting of three independent criteria is designed and successfully applied. Also, a new cellular space automata model called effector automata is introduced. It is shown to be more computationally feasible and to promote the discovery of more self-replicating structures as compared to the cellular automata models used in previous studies. In addition, a new paradigm for cellular space models with weak rotational symmetry called component-sensitive input is introduced and shown to facilitate discovery of self-replicating structures. The results presented suggest that genetic algorithms can be powerful tools for exploring the space of possible self-replicating structures. Furthermore, this research sheds light on the nature of creating self-replicating structures and opens the door to further studies that could eventually lead to the discovery of new self-replicating molecular structures. (Also cross-referenced as UMIACS-TR-96-60) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Electrical Engineering, Univ. of Maryland,
Performance of On-Line Learning Methods in Predicting Multiprocessor. Majd F. Sakr. Steven P. Levitan. Donald M. Chiarulli. Bill G. Horne. C. Lee Giles. October 1996.
Shared memory multiprocessors require reconfigurable interconnection networks (INs) for scalability. These INs are reconfigured by an IN control unit. However, these INs are often plagued by undesirable reconfiguration time that is primarily due to control latency, the amount of time delay that the control unit takes to decide on a desired new IN configuration. To reduce control latency, a trainable prediction unit (PU) was devised and added to the IN controller. The PU's job is to anticipate and reduce control configuration time, the major component of the control latency. Three different on-line prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2-D relaxation algorithm, matrix multiply and Fast Fourier Transform. The predictions were then used by a routing control algorithm to reduce control latency by configuring the IN to provide needed memory access paths before they were requested. Three prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. As expected, different predictors performed best on different applications, however, the TDNN produced the best overall results. (Also cross-referenced as UMIACS-TR-96-59) University of Maryland Institute for Advanced Computer Studies, NEC Research Institute, Princeton NJ, Electrical Engineering Department, University of Pittsburgh, Computer Science Department, University of Pittsburgh, Department of Computer Science, University of Maryland,
Iterative Methods for Problems in Computational Fluid Dynamics. Howard C. Elman. David J. Silvester. Andrew J. Wathen. August 1996.
We discuss iterative methods for solving the algebraic systems of equations arising from linearization and discretization of primitive variable formulations of the incompressible Navier-Stokes equations. Implicit discretization in time leads to a coupled but linear system of partial differential equations at each time step, and discretization in space then produces a series of linear algebraic systems. We give an overview of commonly used time and space discretization techniques,and we discuss a variety of algorithmic strategies for solving the resulting systems of equations.The emphasis is on preconditioning techniques, which can be combined with Krylov subspace iterative methods.In many cases the solution of subsidiary problems such as the discrete convection-diffusion equation and the discrete Stokes equations plays a crucial role. We examine iterative techniques for these problems and show how they can be integrated into effective solution algorithms for the Navier-Stokes equations. (Also cross-referenced as UMIACS-TR-96-58) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Mathematics, University of Manchester Institute of Science, Oxford University Computing Laboratory,
Final Iterations in Interior Point Models -- Preconditioned Conjugate. Weichung Wang. August 1996.
In this article we consider modified search directions in the endgame of interior point methods for linear programming. In this stage, the normal equations determining the search directions become ill-conditioned. The modified search directions are computered by solving perturbed systems in which the systems may be solved efficiently by the preconditioned conjugate gradient solver. We prove the convergence of the interior point methods using the modified search directions and show that each barrier problem is solved with a superlinear convergence rate. A variation of Cholesky factorization is presented for computing a better preconditioner when the normal equations are ill-conditioned. These ideas have been implemented successfully and the numerical results show that the algorithms enhance the performance of the preconditioned conjugate gradients-based interior point methods. Dept. of Computer Science, Univ. of Maryland,
Simulation for Computer Science Majors: A Preliminary Report. Ruth Silverman. August 1996.
The author is revising and restructuring an existing simulation course designed primarily for senior computer science majors by: 1) developing an integrated set of laboratory exercises based on computer science topics using commercially available software (GPSS/H); 2) incorporating these materials into a formal laboratory manual along with related computer science reference materials and instructions in the use of the software; 3) implementing a pilot course using this manual together with a single text in the theory of simulation; 4) preparing a syllabus and a detailed annotated course outline for the instructor, keyed to the manual and the text. The materials developed will be flexible and highly modular allowing their adoption or adaptation at other institutions. (Also cross-referenced as UMIACS-TR-96-57) Center for Automation Research, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Computer and Information Science, Univ. of the District of,
Advances in High Performance Knowledge Representation. James Hendler. Kilian Stoffel. Merwyn Taylor. July 1996.
This report contains two papers describing important new results in the Parka High Performance Knowledge Representation language. We have ported the SIMD Parka knowledge representation system to generic MIMD machines. The system has been recoded in C and supported using runtime optimization packages developed in the High Performance Systems Software Laboratory at the University of Maryland. New ``scanning'' algorithms have been developed for inheritance and recognition inferences. These algorithms have been tested with both random networks and on a recoding of the ontology of the CYC knowledge base as well as on large planning case-bases. Tests show that the new version is significantly faster than the SIMD system, and that it promises to scale well to knowledge bases orders of magnitude larger than CYC. Real world applications are demanding that KR systems provide support for knowledge bases containing millions of assertions. We present Parka-DB, a high-performance reimplementation of the Parka KR language that uses a standard relational DBMS. The integration of a DBMS and the Parka KR language allows us to efficiently support complex queries on extremely large KBs using a single processor, as opposed to our earlier massively parallel system. In addition, the system can make good use of secondary memory, with the whole system needing less than 16MB of RAM to hold a KB of over 2,000,000 assertions. We demonstrate empirically that this reduction in primary storage requires only about 10% overhead in time, and decreases the load time of very large KBs by more than two orders of magnitude. (Also cross-referenced as UMIACS-TR-96-56) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Jeffrey K. Hollingsworth. Ethan L. Miller. Using Content-Derived Names for Caching and Software Distribution. August 1996.
Maintaining replicated data in wide area information services such as the World Wide Web is a difficult problem. Ensuring that the correct versions of libraries and images are installed for application programs presents similar challenges. In this paper, we present a simple scheme to facilitate both of these tasks using content-derived names (CDNs). Content-based naming uses digital signatures to compute a name for an object based only on its content. CDNs can be applied to several common problems of modern computer systems. Caching on the World Wide Web is simplified by allowing references to an object by its content rather than just its location. In a similar fashion, applications can request library objects by their content without having to rely on the presence of a file system hierarchy that the application recognizes. Further, applications that require different versions of an object can coexist peacefully on the same machine. While this idea is still in its early stages, we present experimental evidence from a study of World Wide Web objects that indicates that CDNs could reduce network traffic by allowing requests to be satisfied by differently-named duplicates with the same contents. (Also cross-referenced as UMIACS-TR-96-55) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Computer Science Dept., Univ. of Maryland Baltimore County,
A New Deterministic Parallel Sorting Algorithm With an Experimental. David R. Helman. Joseph Ja'Ja'. David A. Bader. August 1996.
We introduce a new deterministic parallel sorting algorithm based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Moreover, unlike previous variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. This algorithm was implemented in Split C, the IBM SP-2-WN, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, the performance compares closely to that of our random sample sort algorithm, which seems to outperform all similar algorithms known to the authors on these platforms. Together, their performance is nearly invariant over the set of input distributions, unlike previous efficient algorithms. However, unlike our randomized sorting algorithm, the performance and memory requirements of our regular sorting algorithm can be deterministically guaranteed. (Also cross-referenced as UMIACS-TR-96-54) University of Maryland Institute for Advanced Computer Studies, Dept. of Electrical Engineering, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland,
A Randomized Parallel Sorting Algorithm with an Experimental Study. David R. Helman. David A. Bader. Joseph Ja'Ja'. August 1996.
Previous achemes for sorting on general-purpose parallel machines have had to choose betwen poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhard. Moeover, unlike precious variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. The algorithm was implemented in SPLIT-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research T3D. We ran our code useing widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, it seems to outperform all similar algorithms known to the authors on these platforms, and its performance is invariant over the set of input distributions unlike previous efficient algorithms. Our results also compare facorably with those reported for the simpler ranking problem posed by the NAS Integer Sorting (IS) Benchmark. (Also cross-referenced as UMIACS-TR-96-53) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Electrical Engineering, Univ. of Maryland,
Measuring Organization and Asymmetry in Bihemispheric Topographic Maps. Sergio A. Alvarez. Svetlana Levitan. James A. Reggia. September 1996.
We address the problem of measuring the degree of hemispheric organization and asymmetry of organization in a computational model of a bihemispheric cerebral cortex. A theoretical framework for such measures is developed and used to produce algorithms for measuring the degree of organization, symmetry, and lateralization in topographic map formation. The performance of the resulting measures is tested for several topographic maps obtained by self--organization of an initially random network, and the results are compared with subjective assessments made by humans. It is found that the closest agreement with the human assessments is obtained by using organization measures based on sigmoid--type error averaging. Measures are developed which correct for large constant displacements as well as curving of the hemispheric topographic maps. (Also cross-referenced as UMIACS-TR-96-51) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Multiple Vehicle Detection and Tracking in Hard Real Time. Margrit Betke. Esin Haritaoglu. Larry S. Davis. July 1996.
A vision system has been developed that recognizes and tracks multiple vehicles from sequences of gray-scale images taken from a moving car in hard real-time. Recognition is accomplished by combining the analysis of single image frames with the analysis of the motion information provided by multiple consecutive image frames. In single image frames, cars are recognized by matching deformable gray-scale templates, by detecting image features, such as corners, and by evaluating how these features relate to each other. Cars are also recognized by differencing consecutive image frames and by tracking motion parameters that are typical for cars. The vision system utilizes the hard real-time operating system Maruti which guarantees that the timing constraints on the various processes of the vision system are satisfied. The dynamic creation and termination of tracking processes optimizes the amount of computational resources spent and allows fast detection and tracking of multiple cars. Experimental results demonstrate robust, real-time recognition and tracking over thousands of image frames. (Also cross-referenced as UMIACS-TR-96-52) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bilingual Lexicon Construction Using Large Corpora. Wade Shen. Bonnie J. Dorr. October 1997.
This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of filtered linguistic feedback between term and sentence alignment processes. An implementation of this algorithm, TAG-ALIGN, is evaluated against approaches similar to [Brown et al. 1993] that apply Bayesian techniques for term alignment, and [Gale and Church 1991] a dynamic programming method for aligning sentences. The ultimate goal is to produce large bilingual lexicons with a high degree of accuracy from potentially noisy corpora. (Also cross-referenced as UMIACS-TR-97-50) Institute for Advanced Computer Studies, Department of Computer Science,
Ben Shneiderman. July 1996.
The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. A useful starting point for designing advanced graphical user interfaces is the Visual Information-Seeking Mantra: Overview first, zoom and filter, then details-on- demand. But this is only a starting point in trying to understand the rich and varied set of information visualizations that have been proposed in recent years. This paper offers a task by data type taxonomy with seven data types (1-, 2-, 3-dimensional data, temporal and multi-dimensional data, and tree and network data) and seven tasks (overview, zoom, filter, details-on-demand, relate, history, and extract). Also cross-referenced as ISR-TR-96-66 Human Computer Interaction Laboratory, Institute for Systems Research, Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant. July 1996.
1996 HCIL Video Reports. Elastic Windows for Rapid Multiple Window Management Life-Lines: Visualizing Personal Histories Designing Interfaces for Youth Services Information Management Query Previews in Networked Information Systems the Case of EOSDIS Baltimore Learning Communities Table of Contents of the 1995 HCIL Video Reports Table of Contents of the 1994 HCIL Video Reports Visual Information Seeking using the FilmFinder (Extract from the HCIL1994 Video Report Human Computer Interaction Laboratory, Institute for Systems Research, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
BFGS with Update Skipping and Varying Memory. Tamara Gibson. Dianne P. O'Leary. Larry Nazareth. July 1996.
We give conditions under which limited-memory quasi-Newton methods with exact line searches will terminate in $n$ steps when minimizing $n$-dimensional quadratic functions. We show that although all Broyden family methods terminate in $n$ steps in their full-memory versions, only BFGS does so with limited-memory. Additionally, we show that full-memory Broyden family methods with exact line searches terminate in at most $n+p$ steps when $p$ matrix updates are skipped. We introduce new limited-memory BFGS variants and test them on nonquadratic minimization problems. (Also cross-referenced as UMIACS-TR-96-49) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Approximation Algorithms for Connected Dominating Sets. Sudipto Guha. Samir Khuller. June 1996.
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in the dominating set. We focus on the question of finding a {\em connected dominating set} of minimum size, where the graph induced by vertices in the dominating set is required to be {\em connected} as well. This problem arises in network testing, as well as in wireless communication. Two polynomial time algorithms that achieve approximation factors of $O(H(\Delta))$ are presented, where $\Delta$ is the maximum degree, and $H$ is the harmonic function. This question also arises in relation to the traveling tourist problem, where one is looking for the shortest tour such that each vertex is either visited, or has at least one of its neighbors visited. We study a generalization of the problem when the vertices have weights, and give an algorithm which achieves a performance ratio of $3 \ln n$. We also consider the more general problem of finding a connected dominating set of a specified subset of vertices and provide an $O(H(\Delta))$ approximation factor. To prove the bound we also develop an optimal approximation algorithm for the unit node weighted Steiner tree problem. (Also cross-referenced as UMIACS-TR-96-47) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Network-Aware Mobile Programs. M. Ranganathan. Anurag Acharya. Shamik D. Sharma. Joel Saltz. June 1996.
In this paper, we investigate network-aware mobile programs, programs that can use mobility as a tool to adapt to variations in network characteristics. We present infrastructural support for mobility and network monitoring and show how adaptalk, a Java-based mobile Internet chat application can take advantage of this support to dynamically place the chat server so as to minimize response time. Our conclusion was that on-line network monitoring and adaptive placement of shared data-structures can significantly improve performance of distributed applications on the Internet. (Also cross-referenced as UMIACS-TR-96-46) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Carry-Over Round Robin: A Simple Cell Scheduling Mechaniasm for ATM. Debanjan Saha. Sarit Mukherjee. Satish K. Tripathi. June 1996.
We propose a simple cell scheduling mechanism for ATM networks. The proposed mechanism, named Carry-Over Round Robin (CORR), is an extension of weighted round robin scheduling. We show that albeit its simplicity, CORR achieves tight bounds on end-to-end delay and near perfect fairness. Using a variety of video traffic traces we show that CORR often outperforms some of the more complex scheduling disciplines such as Packet-by-Packet Generalized Processor Sharing (PGPS). (Also cross-referenced as UMIACS-TR-96-45) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Multirate Scheduling of VBR Video Traffic in ATM Networks. Debanjan Saha. Sarit Mukherjee. Satish K. Tripathi. June 1996.
(Also cross-referenced as UMIACS-TR-96-44) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, IBM T.J. Watson Research Center, Yorktown Heights, NY, Dept. of Computer Science and Engineering, Univ. of Nebraska,
Active Logic and Heim's Rules for Updating Discourse Context. John Gurney. Don Perlis. Khemdut Purang. June 1996.
Discourse unfolds in time, giving rise to a cascade of belief changes in the listener. Yet this temporal evolution of discourse and belief is typically ignored in theoretical treatments of discourse. It has been claimed (see Soames~\cite{soames:presuppositions}) that Heim's~\cite{heim:projection_problem} theory of discourse context accounts for non-implicative discourse updating. We will present a new non-implicative discourse that cannot be accounted for with Heim's use of global or local accommodation and which appears to require attention to \emph{evolution} of discourse. We use this example to motivate remaking Heim's update function, aimed toward a unified approach to discourse---one in which Heim's rules for discourse updating can account for more of the problem cases for the theory of discourse context. These rules and the revised update function can then serve as principles that constrain the building of representations for discourse context (such as the Discourse Representation Structures, of Discourse Representation Theory, ~\cite{kamp:reyle}). We propose \emph{active logic} as a convenient tool for executing the required inferences (as called for by our revised version of Heim's update function) as the discourse evolves through time. (Also cross-referenced as UMIACS-TR-96-43) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Active Logic Applied to Cancellation of Gricean Implicture. Khemdut Purang. Don Perlis. John Gurney. June 1996.
Dialog proceeds over time, during which inferred beliefs come and go in the listener. Yet this temporal aspect of dialog and belief is typically ignored in theoretical treatments of dialog. Using a simple example of a dialog with an implicature that arises partway through and then is later retracted, we discuss how Gricean maxims and nonmonotonicity may relate to each other and to a computational treatment of implicature. In effect we seek to track reasoning along Gricean lines over time. We present our own computational approach to this, giving an implementation in the formalism of active logics. (Also cross-referenced as UMIACS-TR-96-42) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Conversational Adequacy: Mistakes are the Essence. Don Perlis. Khemdut Purang. June 1996.
We argue that meta-dialog and meta-reasoning, far from being of only occasional use, are the very essence of conversation and communication between agents. We give four paradigm examples of massive use of meta-dialog where only limited base dialog may be present, and use these to bolster our claim of centrality for meta-dialog. We further illustrate this with related work in active logics. We argue moreover that there may be a core set of meta-dialog principles that is in some sense complete. If we are right, then implementing such a set would be of considerable interest. We give examples of existing computer programs that converse inadequately according to our guidelines. (Also cross-referenced as UMIACS-TR-96-41) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Loading Time Scheduling Problem. Randeep Bhatia. Samir Khuller. Joseph (Seffi) Naor. June 1996.
In this paper we study precedence constrained scheduling problems, where the tasks can only be executed on a specified subset of the machines. Each machine has a loading time that is incurred only for the first task that is scheduled on the machine in a particular run. This basic scheduling problem arises in the context of machining on numerically controlled machines, query optimization in databases, and in other artificial intelligence applications. We give the first non-trivial approximation algorithm for this problem. We also prove non-trivial lower bounds on best possible approximation ratios for these problems. These improve on the non-approximability results that are implied by the non-approximability results for the shortests common supersequence problem. We use the same algorithmic technique to obtain approximation algorithms for a problem arising in the context of code generation for parallel machines, and for the weighted shortest common supersequence problem. Dept. of Computer Science, Univ. of Maryland,
Fault Tolerant K-Center Problems. Samir Khuller. Robert Pless. Yoram J. Sussmann. June 1996.
The basic $K$-center problem is a fundamental facility location problem, where we are asked to locate $K$ facilities in a graph, and to assign vertices to facilities, so as to minimize the maximum distance from a vertex to the facility to which it is assigned. This problem is known to be NP-hard, and several optimal approximation algorithms that achieve a factor of $2$ have been developed for it. We focus our attention on a generalization of this problem, where each vertex is required to have a set of $\alpha$ ($\alpha \le K$) centers close to it. In particular, we study two different versions of this problem. In the first version, each vertex is required to have at least $\alpha$ centers close to it. In the second version, each vertex that {\em does not have a center placed on it} is required to have at least $\alpha$ centers close to it. For both these versions we are able to provide polynomial time approximation algorithms that achieve constant approximation factors for {\em any} $\alpha$. For the first version we give an algorithm that achieves an approximation factor of $3$ for any $\alpha$, and achieves an approximation factor of $2$ for $\alpha < 4$. For the second version, we provide algorithms with approximation factors of $2$ for any $\alpha$. The best possible approximation factor for even the basic $K$-center problem is 2. In addition, we give a polynomial time approximation algorithm for a generalization of the $K$-supplier problem where a subset of at most $K$ supplier nodes must be selected as centers so that every demand node has at least $\alpha$ centers close to it. We also provide polynomial time approximation algorithms for all the above problems for generalizations when cost and weight functions are defined on the set of vertices. (Also cross-referenced as UMIACS-TR-96-40) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Capacitated K-Center Problem. Samir Khuller. Yoram J. Sussmann. June 1996.
The capacitated $K$-center problem is a fundamental facility location problem, where we are asked to locate $K$ facilities in a graph, and to assign vertices to facilities, so as to minimize the maximum distance from a vertex to the facility to which it is assigned. Moreover, each facility may be assigned at most $L$ vertices. This problem is known to be NP-hard. We give polynomial time approximation algorithms for two different versions of this problem that achieve approximation factors of 5 and 6. We also study some generalizations of this problem. (Also cross-referenced as UMIACS-TR-96-39) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adaptive Cost Estimation for Client-Server based Heterogeneous Database. Zhaohui Yao. Chungmin Melvin Chen. Nick Roussopoulos. May 1996.
In this paper, we propose a new method for estimating query cost in client-server based heterogeneous database management system. The cost estimation parameters are adjusted by an Adaptive Cost Estimation (ACE) module which uses query execution feedback yielding more and more accurate cost estimates. The most important features of ACE are its detailed cost model which accounts for all costs incurred, its rapid convergence to the actual parameter values, and its low overhead which permits continuous adaptation during the run time of the system. ACE has been implemented and tested with Oracle 6, Oracle 7, Ingres, and ADMS. Extensive experiments performed on these systems show that the ACE's time estimates are within 20% of the real wall-clock time for more than 92% of the queries. This percentage surpasses 98% for queries over 20 seconds. (Also cross-referenced as UMIACS-TR-96-37) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Sandeep Gupta. John S. Baras. Stephen Kelley. Nick Roussopoulos. A case for in-kernel data streaming over the file subsystem. June 1996.
(Also cross-referenced as UMIACS-TR-96-36) University of Maryland Institute for Advanced Computer Studies, Institute of Systems Research, Dept. of Computer Science, Univ. of Maryland,
Signal Stability based Adaptive Routing (SSA) for Ad-Hoc Mobile Networks. Rohit Dube. Cynthia D. Rais. Kuang-Yeh Wang. Satish K. Tripathi. August 1996.
Unlike static networks, ad-hoc networks have no spatial hierarchy and suffer from frequent link failures which prevent mobile hosts from using traditional routing schemes. Under these conditions, mobile hosts must find routes to destinations without the use of designated routers and also must dynamically adapt the routes to the current link conditions. This paper proposes a distributed adaptive routing protocol for finding and maintaining stable routes based on signal strength and location stability in an ad-hoc network and presents an architecture for its implementation. (Also cross-referenced as UMIACS-TR-96-34) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Scrambling Query Plans to Cope With Unexpected Delays. Laurent Amsaleg. Michael J. Franklin. A. Tomasic. T. Urhan.. May 1996.
Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query plan modification techniques that we call query plan scrambling. We present an algorithm which modifies execution plans on-the-fly in response to unexpected delays in data access. The algorithm both reschedules operators and introduces new operators into the plan. We present simulation results that show how our technique effectively hides delays in receiving the initial requested tuples from remote data sources. (Also cross-referenced as UMIACS-TR-96-35) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Diane L. Alonso. Kent L. Norman. May 1996.
Apparency of Contingencies in Single Panel Menus. What we see is not always what we get. This is the problem when the underlying structure of an interface is hidden from the user's view. Users high in Spatial Visualization Ability (SVA), are quick to learn the contingencies of these relationships and are not hindered by this problem. Low SVA users, however, have difficulty visualizing these contingencies and often get lost. We examined data for 97 undergraduate students to determine whether revealing hidden contingencies though visual cues would facilitate Low SVA users, enabling them to approach the level of performance of High SVA users on a computerized path finding task. It was found that increasing interface apparency does seem to benefit all users, but particularly those with Low SVA. (Also cross-referenced as CAR-TR-824) Human Computer Interaction Laboratory, Center for Automation Research, Department of Psychology, Department of Computer Science, Univ. of Maryland,
Douglas W. Oard. Gary Marchionini. May 1996.
A Conceptual Framework for Text Filtering Process. This report develops a conceptual framework for text filtering practice and research, and reviews present practice in the field. Text filtering is an information seeking process in which documents are selected from a dynamic text stream to satisfy a relatively stable and specific information need. A model of the information seeking process is introduced and specialized to define information filtering. The historical development of text filtering is then reviewed and case studies of recent work are used to highlight important design characteristics of modern text filtering systems. Specific techniques drawn from information retrieval, user modeling, machine learning and other related fields are described, and the report concludes with observations on the present state of the art and implications for future research on text filtering. (Also cross-referenced as CAR-TR-830) (Also cross-referenced as EE TR-96-25) (Also cross-referenced as CLIS TR-96-02) Electrical Engineering Department, Digital Library Research Group, Human Computer Interaction Laboratory, Center for Automation Research, Medical Informatics and Computational Intelligence Laboratory, College of Library and Information Services, Dept. of Computer Science, Univ. of Maryland,
Efficient Refreshment of Data Warehouse Views. Lars Baekgaard. Nick Roussopoulos. May 1996.
A data warehouse is a view on a set of distributed and possible loosely coupled source databases. For efficiency reasons a warehouse should be maintained as a materialized view. Therefore, efficient incremental algorithms must be used to periodically refresh the data warehouse. It is possible and desirable to separate the process of warehouse refreshment from the process of warehouse use. In this paper we describe and compare view refreshment algorithms that are based on different combinations of materialized views, partially materialized views, and pointers. Our contribution is twofold. First, our algorithms and data structures are designed to minimize network communication and interactions between the warehouse and the source databases. The minimal set of data that is necessary for both warehouse refreshment and warehouse use is stored on the warehouse. Second, we describe the results of an experiment comparing these methods with respect to storage overhead and I/O. Briefly, the experiment show that algorithms based on a combination of partially materialized views and pointers outperforms algorithms based on materialized views. (Also cross-referenced as UMIACS-TR-96-33) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Gary Marchionini. Catherine Plaisant. Anita Komlodi. February 1996.
UserÕs Needs Assessment for the Library of CongressÕ National Digital Library. Understanding and assessing user needs is the first step in interface design, and this report is one of the first milestones in the overall design effort. This assessment provides an informed basis for the interface design and evaluation to be done in the months to come. It was prepared under the LibraryÕs contract with the Human-Computer Interaction Laboratory (HCIL) at the University of MarylandÕs to work together to design an interface for the LibraryÕs National Digital Library (NDL) Program. In order to determine user needs, HCIL conducted a survey of nine reading rooms with special emphasis on the Special Collections from which the content of the NDL will be drawn. HCIL also used questionnaires to reach remote audiences who may typify NDL users accessing the Library via the Internet. They also analyzed many of the documents available in the Reading Rooms, such as finding aids, other handouts, and user studies. Human Computer Interaction Laboratory, College of Library and Information Services, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Rohit Mahajan. Ben Shneiderman. May 1996.
Visual & Textual Consistency Checking Tools for Graphical User Interfaces. Designing a user interface with a consistent visual design and textual properties with current generation GUI development tools is cumbersome. SHERLOCK, a family of consistency checking tools, has been designed to evaluate visual design and textual pro perties of interface, make the GUI evaluation process less arduous, and aid usability testing. SHERLOCK includes a dialog box summary table to pro vide a compact overview of visual properties of hundreds of dialog boxes of the interface. Terminology specific tools, like Interface Concordance, Terminology Baskets and Interface Speller have been developed. Button specific tools including Button Conco rdance and Button Layout Table have been created to detect variant capitalization, distinct typefaces, distinct colors, variant button sizes and inconsistent button placements. This paper describes the design, software architecture, and the use of SHERLOC K. An experiment with 60 subjects to study the effects of inconsistent interface terminology on user's performance showed 10-25% speedup for consistent interfaces. SHERLOCK was tested with four commercial prototypes; the corresponding outputs, analysis a nd feedback from designers of these applications is presented. (Also cross-referenced as CAR-TR-828) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Stephan Greene. May 1996.
Process Change From User Requirements Elicitation: A Case Study. The Maryland Department of Juvenile Justice (DJJ) is seeking a new information system to replace its legacy system for youth case management. The major goal of the new information system is to improve the process of juvenile case management, and thus deliver more effective services to youths, by better facilitating the tracking of case information and the production and handling of case- related documents. The primary challenge in designing the new system is to integrate optimally the appropriate components of existing processes, information, and documents. Our approach has shown that fostering user discussion and review of existing documents is extremely valuable in defining existing processes and information requirements, and effectively highlights areas where valuable process changes can be made and what system features are needed to support them. Subsequently linking user requirements for documents with innovative graphic user interface techniques can integrate diverse information for users and can affect additional positive changes to organizational processes. (Also cross-referenced as CAR-TR-827) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Anne Rose. Jason Ellis. Catherine Plaisant. Stephan Greene. May 1996.
Life cycle of user interface techniques: The DJJ information system design. To take advantage of todayÕs technology, many organizations are migrating from their legacy systems. With help from the Human-Computer Interaction Laboratory (HCIL) and Cognetics Corporation, the Maryland Department of Juvenile Justice (DJJ) is currently undergoing an effort to redesign their information system to take advantage of graphical user interfaces. As a research lab, HCIL identifies interesting research problems and then prototypes solutions. As a project matures, the exploratory prototypes are adapted to suit the end product requirements. This case study describes the life cycle of three DJJ prototypes: (1) LifeLines, which uses time lines to display an overview of a youth in one screen, (2) the DJJ Navigator, which helps manage individual workloads by displaying different user views, and (3) the ProgramFinder, a tool for selecting the best program for a youth. (Also cross-referenced as CAR-TR-826) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Exploiting Monotone Convergence Functions in Parallel Programs. William Pugh. Evan Rosser. Tatiana Shpeisman. October 1996.
Scientific codes which use iterative methods are often difficult to parallelize well. Such codes usually contain \code{while} loops which iterate until they converge upon the solution. Problems arise since the number of iterations cannot be determined at compile time, and tests for termination usually require a global reduction and an associated barrier. We present a method which allows us avoid performing global barriers and exploit pipelined parallelism when processors can detect non-convergence from local information. (Also cross-referenced as UMIACS-TR-96-31.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ben Shneiderman. April 29, 1996.
Designing Information-Abundant Websites. The deluge of web pages has generated dystopian commentaries on the tragedy of the flood as well as utopian visions of harnessing the same flood for constructive purposes. Within this ocean of information there are also lifeboat web pages with design principles, but often the style parallels the early user interface writings in the 1970s. The well-intentioned Noahs who write from personal experience as website designers, often draw their wisdom from specific projects, making their advice incomplete or lacking in generalizability. Their experience is valuable but the paucity of empirical data to validate or sharpen insight means that some guidelines are misleading. As scientific evidence accumulates, foundational cognitive and perceptual theories will structure the discussion and guide designers in novel situations. (Also cross-referenced as CAR-TR-824) (Also cross-referenced as ISR-TR-96-40) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Interoperability of Data Parallel Runtime Libraries with Meta-Chaos. Guy Edjlali. Alan Sussman. Joel Saltz. May 1996.
This paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. The ability to use multiple libraries is required in many application areas, such as multidisciplinary complex physical simulations and remote sensing image database applications. An application can consist of one program or multiple programs that use different libraries to parallelize operations on distributed data structures. The framework is embodied in a runtime library called Meta-Chaos that has been used to exchange data between data parallel programs written using High Performance Fortran, the Chaos and Multiblock Parti libraries developed at Maryland for handling various types of unstructured problems, and the runtime library for pC++, a data parallel version of C++ from Indiana University. Experimental results show that Meta-Chaos is able to move data between libraries efficiently, and that Meta-Chaos provides effective support for complex applications. (Also cross-referenced as UMIACS-TR-96-30) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Exploiting the Temporal Structure of MPEG Video for the Reduction of. Marwan Krunz. Satish K. Tripathi. May 1996.
We propose a new bandwidth allocation scheme for VBR video traffic in ATM networks. The scheme is tailored to MPEG-coded video sources that require stringent and deterministic quality-of-service guarantees. By exploiting the temporal structure of MPEG sources, we show that our scheme results in an effective bandwidth which, in most cases, is less than the source peak rate. The reduction in the bandwidth requirement is achieved without sacrificing any perceived QoS. Efficient procedures are provided for the computation of the effective bandwidth under heterogeneous MPEG sources. The effective bandwidth strongly depends on the arrangement of the multiplexed streams which is a measure of the degree of synchronization between the GOP patterns of different streams. Assuming that all possible arrangements are equi-probable, we derive an expression for the asymptotic tail distribution of the effective bandwidth. From the tail distribution, we compute several performance measures for the call blocking probability when the allocation is made based on the effective bandwidth. In the case of homogeneous sources, we give a closed-form expression for the `best' arrangement that results in the `optimal' effective bandwidth. Numerical examples based on real MPEG traces are used to demonstrate the advantages of our scheme. (Also cross-referenced as UMIACS-TR-96-29) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On Hybrid Synthesis for Hierarchical Structured Petri Nets. Hong Liu. Jun-Cheol Park. Raymond E. Miller. April 1996.
We propose a hybrid method for synthesis of hierarchical structured Petri nets. In a top-down manner, we decompose a system into a set of subsystems at each level of abstraction, each of these is specified as a blackbox Petri net that has multiple inputs and outputs. We stipulate that each subsystem satisfies the following I/O constraints: (1) At any instance of time, at most one of the inputs can be activated; and (2) If one input is activated, then the subsystem must consume the input and produce exactly one output within a finite length of time. We give a stepwise refinement procedure which starts from the initial high-level abstraction of the system and expands an internal place of a blackbox Petri net into a more detailed subnet at each step. By enforcing the I/O constraints of each subsystem in each intermediate abstraction, our refinement maintains the sequencing of transitions prescribed by the initial abstraction of the system. Next, for the bottom-up synthesis, we present interconnection rules for sequential, parallel, and loop structures and prove that each rule maintains the I/O constraints. Thus, by incorporating these interconnection rules into our refinement formulation, our approach can be regarded as a hybrid Petri net synthesis technique that employs both top-down and bottom-up methods. The major advantage of the method is that the modeling details can be introduced incrementally and naturally, while the important logical properties of the resulting Petri net are guaranteed. Dept. of Computer Science, Univ. of Maryland,
How Embedded Memory in Recurrent Neural Network Architectures Helps. Tsungnan Lin. Bill G. Horne. C. Lee Giles. August 1996.
Learning long-term temporal dependencies with recurrent neural networks can be a difficult problem. It has recently been shown that a class of recurrent neural networks called NARX networks perform much better than conventional recurrent neural networks for learning certain simple long-term dependency problems. The intuitive explanation for this behavior is that the output memories of a NARX network can be manifested as jump-ahead connections in the time-unfolded network. These jump-ahead connections can propagate gradient information more efficiently, thus reducing the sensitivity of the network to long-term dependencies. This work gives empirical justification to our hypothesis that similar improvements in learning long-term dependencies can be achieved with other classes of recurrent neural network architectures simply by increasing the order of the embedded memory. In particular we explore the impact of learning simple long-term dependency problems on three classes of recurrent neural networks architectures: globally recurrent networks, locally recurrent networks, and NARX (output feedback) networks. Comparing the performance of these architectures with different orders of embedded memory on two simple long-term dependences problems shows that all of these classes of networks architectures demonstrate significant improvement on learning long-term dependencies when the orders of embedded memory are increased. These results can be important to a user comfortable to a specific recurrent neural network architecture because simply increasing the embedding memory order will make the architecture more robust to the problem of long-term dependency learning. (Also cross-referenced as UMIACS-TR-96-28) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, NEC Research Institute, Princeton University,
Noisy Time Series Prediction using Symbolic Representation and. Steve Lawrence. Ah Chung Tsoi. C. Lee Giles. April 1996.
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. The symbolic representation aids the extraction of symbolic knowledge from the recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Rules related to well known behavior such as trend following and mean reversal are extracted. Also cross-referenced as UMIACS-TR-96-27 University if Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hierarchical Task Network Planning: Formalization, Analysis, and. Kutluhan Erol. April 1996.
Planning is a central activity in many areas including robotics, manufacturing, space mission sequencing, and logistics. As the size and complexity of planning problems grow, there is great economic pressure to automate this process in order to reduce the cost of planning effort, and to improve the quality of produced plans. AI planning research has focused on general-purpose planning systems which can process the specifications of an application domain and generate solutions to planning problems in that domain. Unfortunately, there is a big gap between theoretical and application oriented work in AI planning. The theoretical work has been mostly based on state-based planning, which has limited practical applications. The application-oriented work has been based on hierarchical task network (HTN) planning, which lacks a theoretical foundation. As a result, in spite of many years of research, building planning applications remains a formidable task. The goal of this dissertation is to facilitate building reliable and effective planning applications. The methodology includes design of a mathematical framework for HTN planning, analysis of this framework, development of provably correct algorithms based on this analysis, and the implementation of these algorithms for further evaluation and exploration. The representation, analyses, and algorithms described in this thesis will make it easier to apply HTN planning techniques effectively and correctly to planning applications. The precise and mathematical nature of the descriptions will also help teaching about HTN planning, will clarify misconceptions in the literature, and will stimulate further research. (Also cross-referenced as UMIACS-TR-96-26) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Overcomimg Instability in Computing the Fundamental Matrix. Daniel P. Heyman. Dianne P. O'Leary. April 1996.
We present an algorithm for solving linear systems involving the probability or rate matrix for a Markov chain. It is based on a UL factorization but works only with a submatrix of the factor U. We demonstrate its utility on Erlang-B models as well as more complicated models of a telephone multiplexing system. (Also cross-referenced as UMIACS-TR-96-24) Dept. of Computer Science, Univ. of Maryland, University of Maryland Insititute for Advanced Computer Studies,
Catherine Plaisant. Anne Rose. March 1996.
Exploring LifeLines to Visualize Patient Records. LifeLines provide a general visualization environment for personal histories. We explored its use for medical patient records. A one screen overview of the record using timelines provides direct access to the data. Problems, hospitalization and medications can be represented as horizontal lines, while icons represent discrete events such as physician consultations (and progress notes) or tests. Line color and thickness can illustrate relationships or significance. Techniques are described to display large records. Rescaling tools and filters allow users to focus on part of the information, revealing more details. Computerized medical records pose tremendous problems to system developers. Infrastructure and privacy issues need to be resolved before physicians can even start using the records. Non-intrusive hardware is required for physicians to do their work (i.e. interview patients) away from their desk and cumbersome workstations. But all the efforts to solve those problems will only succeed if appropriate attention is also given to the user interface design [1][8]. Long lists to scroll, clumsy search, endless menus and lengthy dialogs will lead to user rejection. But techniques are being developed to summarize, filter and present large amount of information, leading us to believe that rapid access to needed data is possible with careful design. While more attention is now put on developing standards for gathering medical records we found that very little effort had been made to design appropriate visualization and navigation techniques to present and explore personal history records. An intuitive approach to visualizing histories is to use graphical time series. The consistent, linear time scale allows comparisons and relations between the quantities displayed. Data can be graphed on the timeline to show time series of quantitative data. Highly interactive interfaces turn the display into a meaningfully structured menu with direct access to the data needed to review a problem or conduct the diagnosis. Also cross-referenced as CAR-TR-819 Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Communication and Organization in Software Development: An Empirical Study. April 1996.
Carolyn B. Seaman. Victor R. Basili. The empirical study described in this paper addresses the issue of communication among members of a software development organization. The independent variables are various attributes of organizational structure. The dependent variable is the effort spent on sharing information which is required by the software