To see a listing with abstracts
To see a listing without abstracts
You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format. However, this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the Department of Computer Science of the University of Maryland at College Park under terms that include this permission. All other rights are reserved by the author(s).
On the communication-storage minimization for a class of secure. Radha Poovendran. July 2000.
Developing cryptographic key management protocols that have scalability in terms of the key storage as well as key update communication is an important problem in many secure multicast applications~\cite{rb1,dea,wgl}. Wong {\em et al.}~\cite{wgl} and Wallner {\em et al.}~\cite{dea} independently presented the first set of key distribution models where the key update communication grows as ${\cal O}(\log N)$ for group of size $N$. However, the storage requirement of these models were ${\cal O(N)$. Recently~\cite{cmn}, a new model based on clustering of the group members was proposed in order to lower the key storage while maintaining the update communication growth as ${\cal O}(\log N)$. For the new model, by considering the product of the storage and the communication as the cost function, the optimal cluster size $M$ was conjectured to be $M= {\cal O}(\log N)$. In this paper, we show that the optimal value of the cluster can be computed without the product function due the monotonicity of the storage with respect to the cluster size. We show that the optimal cluster size selection of the model in~\cite{cmn} can be formulated as a constraint optimization problem, and then transform it to a fixed point equation of the form $M - \lambda \log_e M = (\beta_2 - \lambda)\log_e N$, where $\beta_2, \lambda$ are model parameters. We first show that the largest root of this equation is the optimal solution, and then compute it by two different techniques. We then show that the first order approximation of the solution is of the form $M \approx (\beta_2 -\lambda)\log_e N + \lambda \log_e \log_e N$, leading to $M \approx (\beta_2 - \lambda) \log_e N$ for large values of $N$. We make a case for use of the estimate $M = (\beta_2 -\lambda) \log_e N + \lambda \log_e \log_e N$ instead of $M = \log_e N$ by showing that even for group size up to $2^{32}$, the value $M = \log_e N + \lambda \log_e \log_e N$ provides significantly lower value of key storage compared to the value $M = \log_e N$. We also show that the best estimate of $M$ using the product function in~\cite{cmn} does not exceed $M = \nu \log_e N$ for a constant $\nu$. (Also cross-referenced as UMIACS-TR-2000-58) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance and Analysis of Saddle Point Preconditioners for the. Howard C. Elman. David J. Silvester. Andrew J. Wathen. July 2000.
We examine the convergence characteristics of iterative methods based on a new preconditioning operator for solving the linear systems arising from discretization and linearization of the steady-state Navier-Stokes equations. With a combination of analytic and empirical results, we study the effects of fundamental parameters on convergence. We demonstrate that the preconditioned problem has an eigenvalue distribution consisting of a tightly clustered set together with a small number of outliers. The structure of these distributions is independent of the discretization mesh size, but the cardinality of the set of outliers increases slowly as the viscosity becomes smaller. These characteristics are directly correlated with the convergence properties of iterative solvers. (Also cross-refernced as UMIACS-TR-2000-54) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Rate Windows for Efficient Network and I/O Throttling. Kyung D. Ryu. Jeffrey K. Hollingsworth. Peter J. Keleher. July 2000.
This paper proposes and evaluates a new mechanism for I/O and network rate policing. The goal of the proposed system is to provide an simple, yet effective way to enforce resource limits on target classes of jobs in a system. The basic approach is useful for several types of systems including running background jobs on idle workstations, and providing resource limits on network intensive applications such as virtual web server hosting. Our approach is quite simple, we use a sliding window average of recent events to compute the average rate for a target resource. The assigned limit is enforced by forcing application processes to sleep when they issue requests that would bring their resource utilization out of the allowable profile. Our experimental results that show that we are able to provide the target resource limitations within a few percent, and do so with no measurable slowdown of the overall system. (Also cross-referenced as UMIACS-TR-2000-53) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Image Restoration through Subimages and Confidence Images. James G. Nagy. Dianne P. O'Leary. July 2000.
Some very effective but expensive image reconstruction algorithms cannot be applied to large images because of their cost. In this work, we first show how to apply such algorithms to subimages, giving improved reconstruction of regions of interest. Our second contribution is to construct confidence intervals for pixel values, by generalizing a theorem of O'Leary and Rust to allow both upper and lower bounds on variables. All current algorithms for image deblurring or deconvolution output an image. This provides an estimated value for each pixel in the image. What is lacking is an estimate of the statistical confidence that we can have in those pixel values or in the features they form in the image. There are two obstacles in determining confidence intervals for pixel values: first, the process is computationally quite intensive, and second, there has been no proposal for providing the results in a visually useful way. In this work we overcome the first of those limitations and use a recently developed algorithm called {\sf Twinkle} to overcome the second. We demonstrate the usefulness of these techniques on astronomical and motion-blurred images. (Also cross-referenced as UMIACS-TR-2000-52) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Displaying Confidence Images. James G. Nagy. Dianne P. O'Leary. July 2000.
Algorithms for computing images result in an estimate of an image. The image may result from deblurring a measured image, from deconvolving a set of measurements, or from computing an image by modeling physical processes such as the weather. These computations provide an estimated value for each pixel in the image. What is lacking, however, is an estimate of the statistical confidence that we can have in those pixel values or in the features they form. In this work we discuss novel ways to display confidence information, using an algorithm called {\sf Twinkle}, in order to give the viewer valuable visual insight into uncertainties. The technique is useful whether the confidence information is in the form of a confidence interval or a distribution of possible values. We demonstrate how to display confidence information in a variety of applications: weather forecasts, intensity of a star, and rating a potential tumor in a diagnostic image. (Also cross-referenced as UMIACS-TR-2000-51) Universty of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
AN ANALYSIS OF SMOOTHING EFFECTS OF UPWINDING STRATEGIES FOR THE. HOWARD C. ELMAN. ALISON RAMAGE. June 2000.
Using a technique for constructing analytic expressions for discrete solutions to the convection-diffusion equation, we examine and characterise the effects of upwinding strategies on solution quality. In particular, for grid-aligned flow and discretisation based on bilinear finite elements with streamline upwinding, we show precisely how the amount of upwinding included in the discrete operator affects solution oscillations and accuracy when boundary layers are present. In addition, we show that the same analytic techniques provide insight into other discretisations, such as a finite difference method that incorporates streamline diffusion, and the isotropic artificial diffusion method. (Also cross-referenced as UMIACS-TR-2000-50) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
SHOP and M-SHOP: Planning with Ordered Task Decomposition. Dana Nau. Yue Cao. Amnon Lotem. Hector Munoz-Avila. June 2000.
SHOP (Simple Hierarchical Ordered Planner) and M-SHOP (Multi-task-list SHOP) are planning algorithms with the following characteristics. * SHOP and M-SHOP plan for tasks in the same order that they will later be executed. This avoids some task-interaction issues that arise in other HTN planners, making the planning algorithms relatively simple. This also makes it easy to prove soundness and completeness results. * Since SHOP and M-SHOP know the complete world-state at each step of the planning process, they can use highly expressive domain representations. For example, they can do planning problems that require Horn-clause inferencing, complex numeric computations, and calls to external programs. * In our tests, SHOP and M-SHOP were several orders of magnitude faster than Blackbox, IPP, and UMCP, and were several times as fast as TLplan. * The approach is powerful enough to be used in complex real-world planning problems. For example, we are using a Java implementation of SHOP as part of the HICAP plan-authoring system for Noncombatant Evacuation Operations (NEOs). In this paper, we describe SHOP and M-SHOP, present soundness and completeness results for them, and compare them experimentally to Blackbox, IPP, TLplan, and UMCP. The results suggest that planners that generate totally ordered plans starting from the initial state can "scale up" to complex planning problems better than planners that use partially ordered plans. Department of Computer Science, University of Maryland,
NTCIR CLIR Experiments at the University of Maryland. Douglas W. Oard. Jianqiang Wang. June 2000.
This paper presents results for the Japanese/English cross-language informaiton retrieval task on teh NACSIS Test Collection. Two automatic dictionary-based query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required description only queries and that use of the first translation in the edict dictionary is comparable with the use of every translation. Japanese term segmentation posed no unusual problems, which contrasts sharply with results previously obtained for corss-language retrieval between Chinese and English. (Also cross-referenced as UMIACS-TR-2000-47, LAMP-TR-054) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
TREC-8 Experiments at Maryland: CLIR, QA and Routing. Douglas W. Oard. Jianqiang Wang. Dekang Lin. Ian Soboroff. June 2000.
The University of Maryland team participated in four aspects of TREC-8: the ad hoc retrieval task, the main task in the cross-language retrieval (CLIR) track, the question answering track, and the routing task in the filtering track. The CLIR method was based on Pirkola's method for Dictionary-based Query Translation, using freely available dictionaries. Broad-coverage parsing and rule-based matching was used for question answering. Routing was performed using Latent Semantic Indexing in profile space. (Also cross-referenced as UMIACS-TR-2000-46, LAMP-TR-053) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Structured Translation for Cross-Language Information Retrieval. Ruth Sperer. Douglas W. Oard. June 2000.
The paper introduces a query translation model that reflects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system first determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval effectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications. (Also cross-referenced as UMIACS-TR-2000-45, LAMP-TR-052) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Mining the Web for Bilingual Text. P. Resnik. June 2000.
STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. The most recent end-product is an automatically acquired parallel corpus comprising 2491 English-French document pairs, approximately 1.5 million words per language. (Also cross-referenced as UMIACS-TR-2000-44) (Also cross-referenced as LAMP-TR-051) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Evaluating Lexicon Coverage for Cross-Language Information Retrieval. G. Levow. D.W. Oard. June 2000.
No abstract available (Also cross-referenced as UMIACS-TR-2000-43) (Also cross-referenced as LAMP-TR-050) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Signal Boosting for Translingual Topic Tracking: Document Expansion and. G. Levow and D.W. Oard. June 2000.
No abstract available (Also cross-referenced as UMIACS-TR-2000-42) (Also cross-referenced as LAMP-TR-049) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Statistical Word-Level Translation Model for Comparable Corpora. Mona Diab. Steve Finch. June 2000.
In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance. (Also cross-referenced as UMIACS-TR-2000-41, LAMP-TR-048) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Philip Resnik. Mona Diab. June 2000.
The way we model semantic similarity is closely tied to our understanding of linguistic representations. We present several models of semantic similarity, based on differing representational assumptions, and investigate their properties via comparison with human ratings of verb similarity. The results offer insight into the bases for human similarity judgments and provide a testbed for further investigation of the interactions among syn tactic properties, semantic structure, and semantic con tent. (Also cross-referenced as UMIACS-TR-2000-40, LAMP-TR-047) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classification. Mona Diab. John Schuster. Peter Bock. June 2000.
Quantitative analysis of literary style has heretofore utilized semantic elements-word counts. This research attempts to identify quantifiable syntactic elements of style that can be used for author identification. The measurement of syntactic elements utilizes a dictionary with one part of speech per word and looks at phrases delimited by punctuation marks. Different size permutations of words - referred to as grams - are counted within each text. Correlations are measured amongst the gram frequencies of eight texts pertaining to four authors, both contemporary and non-contemporary. The correlations are performed across different gram sizes of words. The same treatment is applied to a target text, the Funeral Elegy text. The approach holds for classifying texts temporally consistently across the various gram sizes. Yet a finer grained investigation is required to certify the authorship of the Funeral Elegy text. (Also cross-referenced as UMIACS-TR-2000-39, LAMP-TR-046) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Quantifying and Interpreting the Effect of Intelligent Information. Terry P. Riopka. Mona Diab. Peter Bock. June 2000.
A genetic algorithm is simulated using human beings as "chromosomes" in a preliminary study intended to quantify and interpret the effect of intelligent information exchange on genetic algorithm performance. Two factors are varied: the amount of information supplied to the cohort and the type of data manipulation allowed during the exchange. A human simulated genetic algorithm is run for each combination of factors as well as a machine simulation for comparison. Qualitative analysis of recorded conversations indicate extensive use of memory and development of block biases during genetic algorithm evolution. Informal analysis shows that genetic algorithm simulations using complex data manipulations combined with exact knowledge of string fitnesses seem to out-perform a standard machine implementation for the given optimization fitness function. Interestingly, polar combinations: simple data manipulation/minimum information and complex data manipulation/maximum information simulations seem to out-perform other combinations. (Also cross-referenced as UMIACS-TR-2000-38, LAMP-TR-045) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chinese-English Semantic Resource Construction. Bonnie J. Dorr. Gina-Anne Levow. Dekang Lin. Scott Thomas. June 2000.
We describe an approach to large-scale construction of a semantic lexicon for Chinese verbs. We leverage off of three existing resources--a classification of English verbs called EVCA (English Verbs Classes and Alterations) [Levin, 1993], a Chinese conceptual database called HowNet [Zhendong, 1988c, Zhendong, 1988b] (http://www.how-net.com), and a large machine-readable dictionary called Optilex. The resulting lexicon is used for determining appropriate word senses in applications such as machine translation and cross-language information retrieval. (Also cross-referenced as UMIACS-TR-2000-27) (Also cross-referenced as LAMP-TR-044) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Construction of Chinese-English Semantic Hierarchy for Information. Gina-Anne Levow. Bonnie Dorr. Dekang Lin. June 2000.
This paper describes an approach to large-scale construction of a semantic hierarchy for Chinese verbs. Leveraging off of an existing Chinese conceptual database called HowNet and a Levin-based English verb classification, we use thematic-role information to create links between Chinese concepts and English classes. The resulting hierarchy is used for multilingual lexicons in an English-Chinese cross-language information retrieval application. We demonstrate a structured syntax interface that exploits this large-scale hierarchy and its linkages to WordNet for English-Chinese cross-language information retrieval. (Also cross-referenced asUMIACS-TR-2000-36) (Also cross-referenced as LAMP-TR-043) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
oxyGen: A Language Independent Linearization Engine. Nizar Habash. May 2000.
This paper describes a language independent linearization engine, oxyGen. This system compiles target language grammars into programs that take feature graphs as inputs and generate word lattices that can be passed along to the statistical extraction module of the generation system Nitrogen. The grammars are written using a flexible and powerful language, oxyL, that has the power of a programming language but focuses on natural language realization. This engine have been used successfully in creating an English linearization program that is currently used as part of a Chinese-English machine translation system. (Also cross-referenced as UMIACS-TR-2000-35) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hashing Moving Objects. Zhexuan Song. Nick Roussopoulos. June 2000.
In real-life applications, the objects are both spatial and temporal referenced. The objects which continuously change their location are called moving objects. With the development of wireless communication and positioning technology, it becomes necessary to store and index those objects in database. Due to the complexity of the problem, many pure spatial index structures are unable to index large volume of moving objects in database. In this paper, we propose a whole new idea based on hashing technique. Since it is impossible to re-index all the objects after each time period, we store the objects in buckets. When an object moves within a bucket, the database does not make any change. By using this technique, the number of database update is greatly reduced which makes the index procedure feasible. Then, we extend the previous system structure by introducing a filter layer between the position information collectors and the database. Also four different methods based on the new system structure are presented. Performance experiments were performed to evaluate different aspects of our indexing techniques, and the conclusions are included in the paper. Department of Computer Science, University of Maryland
Broadening Access to Large Online Databases by Generalizing Query. E. Tanin. C. Plaisant. B. Shneiderman. May 2000.
Companies, government agencies, and other types of organizations are making their large databases available to the world over the Internet. Current database front-ends do not give users information about the distribution of data. This leads many users to waste time and network resources posing queries that have either zero-hit or mega-hit result sets. Query previews form a novel visual approach for browsing large databases. Query previews supply data distribution information about the database that is being searched and give continuous feedback about the size of the result set for the query as it is being formed. On the other hand, query previews use only a few pre-selected attributes of the database. The distribution information is displayed only on these attributes. Unfortunately, many databases are formed of numerous relations and attributes. This paper introduces a generalization of query previews. We allow users to browse all of the relations and attributes of a database using a hierarchical browser. Any of the attributes can be used to display the distribution information, making query previews applicable to many public online databases. (Also cross-referenced as UMIACS-TR-2000-32) (Also cross-referenced as HCIL-TR-2000-14) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Fisheye Menus. B. B. Bederson. May 2000.
We introduce "fisheye menus" which apply traditional fisheye graphical visualization techniques to linear menus. This provides for an efficient mechanism to select items from long menus, which are becoming more common as menus are used to select data items in, for example, e-commerce applications. Fisheye menus dynamically change the size of menu items to provide a focus area around the mouse pointer. This makes it possible to present the entire menu on a single screen without requiring buttons, scrollbars, or hierarchies. A pilot study with 10 users compared user preference of fisheye menus with traditional pull-down menus that use scrolling arrows, scrollbars, and hierarchies. Users preferred the fisheye menus for browsing tasks, and hierarchical menus for goal-directed tasks. (Also cross-referenced as UMIACS-TR-2000-31) (Also cross-referenced as HCIL-TR-2000-12) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Jazz: An Extensible Zoomable User Interface Graphics ToolKit in Java. B. B. Bederson. J. Meyer. L. Good. May 2000.
In this paper we investigate the use of scene graphs as a general approach for implementing two-dimensional (2D) graphical applications, and in particular Zoomable User Interfaces (ZUIs). Scene graphs are typically found in three-dimensional (3D) graphics packages such as Sun's Java3D and SGI's OpenInventor. They have not been widely adopted by 2D graphical user interface toolkits. To explore the effectiveness of scene graph techniques, we have developed Jazz, a general-purpose 2D scene graph toolkit. Jazz is implemented in Java using Java2D, and runs on all platforms that support Java 2. This paper describes Jazz and the lessons we learned using Jazz for ZUIs. It also discusses how 2D scene graphs can be applied to other application areas. (also cross-referenced as UMIACS-TR-2000-30) (Also cross-referenced as HCIL-TR-2000-13) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
User Modeling for Information Access Based on Implicit Feedback. J. Kim. D. W. Oard. K. Romanik. May 2000.
User modeling can be used in information filtering and retrieval systems to improve the representation of a users information needs. User models can be constructed by hand, or learned automatically based on feedback provided by the user about the relevance of documents that they have examined. By observing user behavior, it is possible to infer implicit feedback without requiring explicit relevance judgments. Previous studies based on Internet discussion groups (USENET news) have shown reading time to be a useful source of implicit feedback for predicting a users preferences. The study reported in this paper extends that work by providing framework for considering alternative sources of implicit feedback, examining whether reading time is useful for predicting a users preferences for academic and professional journal articles, and exploring whether retention behavior can usefully augment the information that reading time provides. Two user studies were conducted in which undergraduate students examined articles and abstracts related to the telecommunications and pharmaceutical industries. The results showed that reading time could be used to predict the users assessment of relevance, although reading time for journal articles and technical abstracts are longer than has been reported for USENET news documents. Observation of printing events, a type of retention behavior, was found to provide additional useful evidence about relevance beyond that which could be inferred from reading time. The paper concludes with a brief discussion of the implications of the reported results. (Also cross-referenced as UMIACS-TR-2000-29) (Also cross-referenced as HCIL-TR-2000-11) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Navigational Issues in the Design of On-Line Self-Administered. K. L. Norman. Z. Friedman. K. Norman. R. Stevenson. May 2000.
Answering questions on surveys involves the access of internal cognitive knowledge structures, the retrieval of records from external data-bases, and the navigation of items on the computer interface. In this study a number of alternative designs for on-line questionnaire presentation were investigated. A long heterogeneous survey was partitioned in four ways: whole/form-based, semantic/section-based, screen/page-based, and single item-based. Questionnaires were presented with or without an index which resulted in eight versions. Times for initial completion of the questionnaire were recorded as well as subjective assessments. Neither initial completion times nor subjective assessments differed among the eight versions due to the highly linear navigation of the survey structures. Respondents were also asked to revisit 16 questions based on only the topic of the question or on the topic and the question number and to change their answers. Revision times reflected ease of finding items in the structure of the survey and the use of an index to the sections of the questionnaire. University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
DataCutter and A Client Interface for the Storage Resource Broker with. Tahsin Kurc. Michael Beynon. Alan Sussman. Joel Saltz. May 2000.
The continuing increase in the capabilities of high performance computers and continued decreases in the cost of secondary and tertiary storage systems is making it increasingly feasible to generate and archive very large (e.g. petabyte and larger) datasets. Applications are also increasingly likely to make use of archived data obtained by different types of sensors. Such sensors include imaging devices deployed on satellites and aircraft, microscopy related imagery and radiology related imagery. Simulation or sensor datasets generated or acquired by one group may need to be accessed over a wide-area network by other groups. Datasets frequently describe data associated with collections of very large structured or unstructured grids where each grid point is associated with several variables. Applications frequently need only to obtain portions of a dataset. Required data may correspond to a particular region in a multidimensional space. The application may need to access all data associated in a multidimensional region or it may need only certain variable values at a subsampled set of spatial locations. In addition, in some cases, applications may require data products obtained by aggregating data in one way or another. For instance, a user might require time or space averaged data. This document describes the design of a middleware infrastructure, called DataCutter, that enables subsetting and user-defined filtering of multi-dimensional datasets stored in archival storage systems across a wide-area network. We also describe a client API for Storage Resource Broker (SRB) clients, which allows SRB clients to carry out subsetting and filtering of datasets stored through the SRB. This API uses a prototype implementation of the DataCutter indexing and filtering services. (Also cross-referenced as UMIACS-TR-2000-26) University of Maryland Institute for Advamced Computer Studies, Department of Computer Science, University of Maryland,
The periodic polytope and its applications to a scheduling problem - A. K. Subramani. A. Agrawala. May 2000.
Parameter variability and the existence of complex constraints between tasks are assured features of real-time scheduling. {\em Periodicity} of task sets is an additional feature that needs to be accomodated. Traditional scheduling models ignore the complexities involved in real-time scheduling by making simplistic assumptions about task interactions. In this paper, we present a model that captures the issues that we deem central to real-time scheduling in periodic task sets and demonstrate the existence of efficient and easily implementable algorithms for addressing schedulability queries in this model. Our model is very general and applicable to diverse areas ranging from real-time process scheduling in operating systems and avionics to manufacturing and traffic control. (Also cross-referenced as UMIACS-TR-2000-25) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland\,
Extending User Understanding of Federal Statistics in Tables. Gary Marchionini. Carol Hert. Liz Liddy. Ben Shneiderman. May 2000.
This paper describes progress toward improving user interfaces for US Federal government statistics that are presented in tables. Based on studies of user behaviors and needs related to statistical tables, we describe interfaces to assist diverse users with a range of statistical literacy to explore, find, understand, and use US Federal government statistics. (HCIL-TR-2000-08) (Also cross-referenced UMIACS-TR-2000-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Direct Annotation: A Drag-and-Drop Strategy for Labeling Photos. B. Shneiderman. H. Kang. April 2000.
Annotating photos is such a time-consuming, tedious and error-prone data entry task that it discourages most owners of personal photo libraries. By allowing users to drag labels such as personal names from a scrolling list and drop them on a photo, we believe we can make the task faster, easier and more appealing. Since the names are entered in a database, searching for all photos of a friend or family member is dramatically simplified. We describe the user interface design and the database schema to support direct annotation, as implemented in our PhotoFinder prototype. (HCIL-2000-06) (Also cross-referenced as UMIACS-TR-2000-23) University of Maryland Institute for Advamced Computer Stdies, Human-Computer Interaction Laboratory, University of Maryland, Department of Computer Science, University of Maryland,
Snap-Together Visualization: A User Interface for Coordinating. C. North. B. Shneiderman. April 2000.
Multiple coordinated visualizations enable users to rapidly explore complex information. However, users often need unforeseen combinations of coordinated visualizations that are appropriate for their data. Snap-Together Visualization enables data users to rapidly and dynamically mix and match visualizations and coordinations to construct custom exploration interfaces without programming. Snap's conceptual model is based on the relational database model. Users load relations into visualizations then coordinate them based on the relational joins between them. Users can create different types of coordinations such as: brushing, drill down, overview and detail view, and synchronized scrolling. Visualization developers can make their independent visualizations snap-able with a simple API. Evaluation of Snap revealed benefits, cognitive issues, and usability concerns. Data savvy users were very capable and thrilled to rapidly construct powerful coordinated visualizations. A snapped overview and detail-view coordination improved user performance by 30-80%, depending on task. (Also cross-referenced as UMIACS-TR-2000-22) University of Maryland Institute for Advanced Computer Studies, Human-Computer Interaction Laboratory, University of Maryland, Department of Computer Science, University of Maryland,
An Arnoldi--Schur Algorithm for Large Eigenproblems. G. W. Stewart. April 2000.
Sorensen's iteratively restarted Arnoldi algorithm is one of the most successful and flexible methods for finding a few eigenpairs of a large matrix. However, the need to preserve structure of the Arnoldi decomposition, on which the algorithm is based, restricts the range of transformations that can be performed on it. In consequence, it is difficult to deflate converged Ritz vectors from the decomposition. Moreover, the potential forward instability of the implicit QR algorithm can cause unwanted Ritz vectors to persist in the computation. In this paper we introduce a generalized Arnoldi decomposition that solves both problems in a natural and efficient manner. (Also cross-referenced as UMIACS-TR-2000-21) University of Maryland Institute for Advanced Computer Studies), Department of Computer Science, University of Maryland,
Buffer Merging --- A Powerful Technique for Reducing Memory. P. K. Murthy. S. S. Bhattacharyya. April 2000.
In this paper, we develop a new technique called buffer merging for reducing memory requirements of synchronous dataflow (SDF) specifications. SDF has proven to be an attractive model for specifying DSP systems, and is used in many commercial tools like DSPCanvas, SPW, and COSSAP. Good synthesis from an SDF specification depends crucially on scheduling, and memory is an important metric for generating efficient schedules. Previous techniques on memory minimization have either not considered buffer sharing at all, or have done so at a fairly coarse level (the meaning of this will be made more precise in the paper). In this paper, we develop a buffer overlaying strategy that works at the level of an input/output edge pair of an actor. It works by algebraically encapsulating the lifetimes of the tokens on the input/output edge pair, and determines the maximum amount of the input buffer space that can be reused by the output. We develop the mathematical basis for performing merging operations, and develop several algorithms and heuristics for using the merging technique for generating efficient implementations. We show improvements of up to 54% over previous techniques. (Also cross-referenced as UMIACS-TR-2000-20) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Support for Speculative Update Propagation and Mobility in Deno. Ugur Cetintemel. Peter J. Keleher. Michael Franklin. November 1999.
This paper presents the transactional framework of Deno, an object replication system specifically designed for use in mobile and weakly-connected environments. Deno uses weighted voting for availability and pair-wise, epidemic information flow for flexibility. This combination allows the protocols to operate with less than full connectivity, to easily adapt to changes in group member-ship, and to make few assumptions about the underlying network topology. These features are all crucial to providing effective support for mobile and weakly-connected platforms. Deno has been implemented and runs on top of Linux and Windows NT/CE platforms. We use the Deno prototype to characterize the performance of two versions of Deno's protocol. The first ver-sion enables globally serializable execution of update transactions. The second supports a weaker consistency level that still guarantees transactionally consistent access to replicated data. The re-sults show that our protocols either outperform or perform comparably to existing approaches, while achieving higher availability. Further, we show that the incremental cost of providing global serializability in this environment is low. Finally, we show that commit delays can be sig-nificantly decreased by allowing votes to be cast, and votes and updates to be disseminated, speculatively. (Also cross-referenced as UMIACS-TR-99-70) UNiversity of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Large-Scale Construction of a Chinese-English Semantic Hierarchy. Bonnie J. Dorr. Gina-Anne Levow. Dekang Lin. June 2000.
This paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a semantic hierarchy for Chinese verbs, using thematic-role information to create links between Chinese concepts and English classes. We then present an approach to compensating for gaps in the existing resources. The resulting hierarchy is used for a multilingual lexicon for Chinese-English machine translation and cross-language information retrieval applications. (Also cross-referenced as UMIACS-TR-2000-17) (Also cross-referemced as LAMP-TR-040) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Parametric Polytope and its applications to a Scheduling Problem. K. Subrmani. A. Agrawala. March 2000.
An important feature in Real-time systems is {\em parameter impreciseness} i.e. the inability to accurately determine certain parameter values. The most common such parameter is {\em task execution time}. A second feature is the presence of complex relationships between tasks that constrain their execution. Traditional models do not accomodate either feature completely: (a) Variable execution times are modeled through a fixed value ( {\em worst-case} ), and (b) Relationships are limited to those that can be represented by precedence graphs. We present a task model that effectively captures {\em variable task execution time}, while simultaneously permitting arbitrary linear relationships between tasks. Our model finds applications in diverse areas such as real-time task scheduling, compiler scheduling, real-time database scheduling and machine control. This paper focuses primarily on the computational complexity of answering queries posed in our model; in particular we demonstrate the existence of constraint classes that make the scheduling problem {\em hard.} (Also cross-referenced as UMIACS-TR-2000-16) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Characterisation of Oscillations in the Discrete Two-Dimensional. Howard C. Elman. Alison Ramage. March 2000.
It is well known that discrete solutions to the convection-diffusion equation contain nonphysical oscillations when boundary layers are present but not resolved by the discretisation. However, except for one-dimensional problems, there is little analysis of this phenomenon. In this paper, we present an analysis of the two-dimensional problem with constant flow aligned with the grid, based on a Fourier decomposition of the discrete solution. For Galerkin bilinear finite element discretisations, we derive closed form expressions for the Fourier coefficients, showing them to be weighted sums of certain functions which are oscillatory when the mesh P\'{e}clet number is large. The oscillatory functions are determined as solutions to a set of three-term recurrences, are then used to characterise the oscillations of the discrete solution in terms of the mesh P\'{e}clet number and boundary conditions of the problem. (Also cross-referenced UMIACS-TR-2000-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Svience, University of Maryland,
The Static Polytope and its applications to a scheduling problem. K. Subramani. A. Agrawala. March 2000.
In the design of real-time systems, it is often the case that certain process parameters ( such as {\em execution time} ) are not known precisely. The challenge in real-time system design is to develop techniques that efficiently meet the requirements of impreciseness. Traditional models tend to simplify the issue of impreciseness by assuming {\em worst-case} times. This assumption is unrealistic and at the same time, may cause certain constraints to be violated at run-time. In this paper, we shall study the problem of scheduling a set of ordered, non-preemptive processes under non-constant execution times. Typical applications for variable execution time scheduling include process scheduling in Real-time Operating Systems such as Maruti, compiler scheduling, database transaction scheduling and automated machine control. An important feature of application areas such as robotics is the interaction between execution times of various processes. We explicitly model this interaction through the representation of execution time vectors as points in convex sets. We present both sequential and parallel algorithms for determining the existence of a static schedule. (Also cross-referenced as UMIACS-TR-2000-14) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Dual intepretation of Standard Constraints in Parametric Scheduling. K. Subramani. A. Agrawala. March 2000.
The problem of parametric scheduling in hard real-time systems, ( in the presence of linear relative constraints between the start and execution times of tasks ) was posed in the litreature. In an earlier paper, a polynomial time algorithm is presented for the case when the constraints are restricted to be standard ( defined in paper ) and the execution time vectors belong to an axis-parallel hyper-rectangle. In this paper, we extend their results in two directions. We first present a polynomial time algorithm for the case when the execution time vectors belong to arbitrary convex domains. We then show that the set of standard constraints can be extended to include arbitrary network constraints. Our insights into the problem occur primarily as a result of studying the dual polytope of the constraint system. (Also cross-refernced as UMIACS-TR-2000-11) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Bolshoi - A Modeling Spreadsheet (Improving Usability of Complex. William C. Cheng. Leana Golubchik. March 2000.
Spreadsheet programs are very popular financial modeling tools because they allow users to juggle numbers and formulas with a powerful yet intuitive and easy to understand user interface; also, they often are equipped with sophisticated numerical analysis packages for data analysis and powerful presentation utilities for visualizing results. Computer systems performance and reliability modeling tools of today, on the other hand, have un-intuitive user interfaces and are difficult to learn and use. In this work, we propose to design, build, and evaluate Bolshoi, a modeling spreadsheet, with the goal of putting modeling tools comfortably in the hands of non-expert users. In this proposal, we address management of complexity that exists in performance and reliability analysis of real computer and communication systems. Specifically, we propose to do so through the design and development of an advanced modeling tool. Our tool will provide two important functions: (1) a proper interface for building models that will allow system designers not just to define their models, but visualize them in various ways and (2) easy plug-in of existing and future advanced solution techniques. We call this tool Bolshoi, a Modeling Spreadsheet, because it has a spreadsheet-type interface as detailed below. Performance evaluation of real systems is complex, suffers from scalability problems (or the so-called ``state explosion'' problem) and in many cases requires advanced computational techniques. Often, advanced computational techniques are based on exploitation of ``special structure'' in the models (the primary way to deal with state explosion besides getting a bigger machine). With large and complex models, these special structures are very expensive to expose automatically as it involves searching through a combinatorial number of permutations. Proper visualization of models can greatly assist in the discovery of these special structures so that state space reduction techniques can be applied. Discovery of special structure regularly contributes to many orders of magnitude in computational efficiency. Furthermore, models are often defined over infinite state spaces. We believe that a spreadsheet paradigm is ideal for visualizing such models. Without proper modeling tools, much effort and money is wasted by the computer industry, and moreover, the probability of a successful outcome is low. Thus, a good tool is crucial to advances in the state of the art in performance modeling as well as to successful design of systems in the industry. Every system designer should be able to integrate the use of a performance modeling tool into his/her design process. He/she should be able to easily ask ``what-if'' type questions, explore possible design choices, and make decisions based on quantitative results rather than ``gut feeling''. We believe that a modeling spreadsheet is the right abstraction for such tasks, and furthermore, to the best of our knowledge this abstraction has not been exploited for performance evaluation tool purposes. We believe that the approach proposed here will have a significant impact on future performance tool designs as well as make significant strides in wide-spread use of performance evaluation techniques among computer and communication system designers. Furthermore, a modeling tool that does not require expert-level methodology knowledge is also an excellent undergraduate-level and graduate-level educational tool. Opportunities for hands-on experience with modeling and performance evaluation as well as the ability to add new techniques to the tool greatly improve the educational experience of students and their future ability to apply what they have learned in class to design of real computer and communication systems. (Also cross-referenced as UMIACS-TR-2000-10) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Contention-conscious transaction ordering in embedded multiprocessors. Mukul Khandelia. Shuvra S. Bhattacharyya. March 2000.
This paper explores the problem of efficiently ordering interprocessor communication operations in statically-scheduled multiprocessors for iterative dataflow graphs. In most digital signal processing applications, the throughput of the system is significantly affected by communication costs. By explicitly modeling these costs within an effective graph-theoretic analysis framework, we show that ordered transaction schedules can significantly outperform self-timed schedules even when synchronization costs are low. However, we also show that when communication latencies are non-negligible, finding an optimal transaction order given a static schedule is an NP-complete problem, and that this intractability holds both under iterative and non-iterative execution. We develop new heuristics for finding efficient transaction orders, and perform an experimental comparison to gauge the performance of these heuristics. (Also cross-referenced as UMIACS-TR-2000-09) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Information Dynamics: An Information-Centric Approach to System Design". Ashok K. Agrawala. Ronald L. Larsen. Douglas Szajda. January 2000.
Acquisition, distribution, management, and analysis of information are the fundamental purposes behind most complex constructed systems and infrastructures, and yet a process centric approach is fundamental to the design and implementation of such systems. Since information is the essential commodity in these endeavors, we believe that an effective design should take into account the fundamental properties of information: it's characteristics, its fusion, its distillation, etc. Information Dynamics is an attempt to bring a degree of rigor to the understanding of the nature of information itself and how it is used in pursuit of system objectives. (Also cross-referenced as UMIACS-2000-08) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Designing StoryRooms: Interactive Storytelling Spaces for Children. Houman Alborzi. Allison Druin. Jaime Montemayor. Lisa Sherman. Gustav Taxn. Jack Best. Joe Hammer. Alex Kruskal. Abby Lal. Thomas Plaisant Schwenn. Lauren Sumida. Rebecca Wagner. Jim Hendler. February 2000.
Limited access to space, costly props, and complicated authoring technologies are among the many reasons why children can rarely enjoy the experience of authoring room-sized interactive stories. Typically in these kinds of environments, children are restricted to being story participants, rather than story authors. Therefore, we have begun the development of "StoryRooms," room-sized immersive storytelling experiences for children. With the use of low-tech and high-tech storytelling elements, children can author physical storytelling experiences to share with other children. In the paper that follows, we will describe our design philosophy, design process with children, the current technology implementation and example StoryRooms. (Also cross-referenced as UMIACS-TR-2000-06) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory,
MOCHA: A Self-Extensible Database Middleware System for Distributed. Manuel Rodriguez-Martinez. Nick Roussopoulos. January 2000.
This paper describes MOCHA, a new self-extensible database middleware system designed to interconnect data sources distributed over a computer network. MOCHA is designed to scale to large environments and is based on the idea that some of the user-defined functionality in the system should be deployed by the middleware itself. This is realized by shipping Java code implementing either advanced data types or tailored query operators to remote data sources and have it executed remotely. Optimized query plans push the evaluation of powerful data-reducing operators to the data source sites while executing data-inflating operators near the client's site. The Volume Reduction Factor is a new and more explicit metric introduced in this paper to select the best site to execute query operators and is shown to be more accurate than the standard selectivity factor alone. MOCHA has been implemented in Java and runs on top of Informix and Oracle. We present the architecture of MOCHA, the ideas behind it, and a performance study using data and queries from the Sequoia 2000 Benchmark. The results of this study demonstrate that MOCHA not only provides a flexible and scalable framework for distributed query processing but also substantially improves query performance in contrast to existing middleware solutions. (Also cross-referenced as UMIACS-TR-2000-05) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Design of a Framework for Data-Intensive Wide-Area Applications. Michael D. Beynon. Tahsin Kurc. Alan Sussman. Joel Saltz. February 2000.
Applications that use collections of very large, distributed datasets have become an increasingly important part of science and engineering. With high performance wide-area networks becoming more pervasive, there is interest in making collective use of distributed computational and data resources. Recent work has converged to the notion of the Grid, which attempts to uniformly present a heterogeneous collection of distributed resources. Current Grid research covers many areas from low level infrastructure issues to high level application concerns. However, providing support for efficient exploration and processing of very large scientific datasets stored in distributed archival storage systems remains a challenging research issue. We have initiated an effort that focuses on developing efficient data-intensive applications in a Grid environment. In this paper, we present a framework, called filter-stream programming, that represents the processing units of a data-intensive application as a set of filters, which are designed to be efficient in their use of memory and scratch space. We describe a prototype infrastructure that supports execution of applications using the proposed framework. We present the implementation of two applications using the filter-stream programming framework, and discuss experimental results demonstrating the effects of heterogeneous resources on application performance. (Also cross-referenced as UMIACS-TR-2000-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Clustering Scheme for Hierarchical Routing in Wireless Networks. Suman Banerjee. Samir Khuller. February 2000.
In this paper we present a clustering scheme to create hierarchies for wireless networks. A cluster is defined as a subset of vertices, whose induced graph is connected. In addition, a cluster is required to obey certain constraints that are useful for hierarchical routing. While all these constraints cannot be met simultaneously for general graphs, we show how for wireless network topologies, such a clustering can be obtained. We also present simulation results from a distributed implementation of this scheme to demonstrate its convergence and stability properties. Department of Computer Science, University of Maryland,
Optimizing Retrieval and Processing of Multi-dimensional Scientific. Chialin Chang. Tahsin Kurc. Alan Sussman. Joel Saltz. February 2000.
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular $d$-dimensional array. We present an experimental evaluation of these models for various synthetic datasets and for several driving applications on a 128-node IBM SP. (Also cross-referenced as UMIACS-TR-2000-03) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
IMPACTing SHOP: Foundations for integrating HTN Planning and. Hector Munoz-Avila. Juergen Dix. Dana S. Nau. Yue Cao. February 2000.
In this paper we describe a formalism for integrating the SHOP HTN planning system with the IMPACT multi-agent environment. Our formalism provides an agentized adaptation of the SHOP planning algorithm that takes advantage of IMPACT's capabilities for interacting with external agents, performing mixed symbolic/numeric computations, and making queries to distributed, heterogeneous information sources (such as arbitrary legacy and/or specialized data structures or external databases). We show that this agentized version of SHOP will preserve soundness and completeness if certain conditions are met. (This technical report is the updated version of CS-TR-4085) (Also cross-referenced as UMIACS-TR-2000-02) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Eigensystems of Graded Matrices. G. W. Stewart. January 2000.
Informally a graded matrix is one whose elements show a systematic decrease or increase as one passes across the matrix. It is well known that graded matrices often have small eigenvalues that are determined to high relative accuracy. Similarly, the eigenvectors can have small components that are nonetheless well determined. In this paper, we give approximations to the eigenvalues and eigenvectors of a graded matrix in terms of a base matrix that show how these phenomena come about. This approach provides condition numbers for eigenvalues and individual components of the eigenvectors. The results are applied to derive related results for the singular value decomposition. (Also cross-referenced as UMAICS-TR-2000-01) University of Maryland Institute for Advanced Computer Studies, Department of Computer Sciece, University of Maryland,
A Generalization of Saad's Theorem on Rayleigh-Ritz. G. W. Stewart. December 1999.
Let $(\lambda,x)$ be an eigenpair of the Hermitian matrix $A$ of order $n$ and let $(\mu,u)$ be a Ritz pair from a subspace $\clk$ of $\comp^{2}$. Saad has given a simple inequality bounding $\sin\angle(x,u)$ in terms of $\sin\angle(x,\clk)$. In this note we show that this inequality can be extended to an equally simple inequality for eigenspaces of non-Hermitian matrices. (Also cross-referenced as UMIACS-TR-99-78) University of Maryland, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Probabilistic Object Bases. Thomas Eiter. James Lu. Thomas Lukasiewicz. V.S. Subrahmanian. November 1999.
There are many applications where an object oriented data model is a good way of representing and querying data. However, current object database systems are unable to handle the case of objects whose attributes are uncertain. In this paper, extending previous pioneering work by Kornatzky and Shimony, we develop an extension of the relational algebra to the case of object bases with uncertainty. We propose concepts of consistency for such object bases, together with an NP-completeness result, and classes of probabilistic object bases for which consistency is polynomially checkable. In addition, as certain operations involve conjunctions and disjunctions of events, and as the probability of conjunctive and disjunctive events depends both on the probabilities of the primitive events involved as well as on what is known (if anything) about the relationship between the events, we show how all our algebraic operations may be performed under arbitrary probabilistic conjunction and disjunction strategies. We also develop a host of equivalence results in our algebra, which may be used as rewrite rules for query optimization. Last but not least, we have developed a prototype probabilistic object base server using the VisiBroker ORB on top of ObjectStore. We describe experiments to assess the efficiency of different possible rewrite rules. (Also cross-referenced as UMIACS-TR-99-77) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Designing Storytelling Technologies to Encourage Collabortion Between. Steve Benford. Benjamin B. Bederson. Karl-Petter Åkesson. Victor Bayon. Allison Druin. Pär Hansson. Juan Pablo Hourcade. Rob Ingram. Helen Neale. Claire O’Malle. Kristian T. Simsarian. Danaë Stanton. Yngve Sundblad. Gustav Taxén. November 1999.
We describe the iterative design of two collaborative storytelling technologies for young children, KidPad and the Klump. We focus on the idea of designing interfaces to subtly encourage collaboration so that children are invited to discover the added benefits of working together. This idea has been motivated by our experiences of using early versions of our technologies in schools in Sweden and the UK. We compare the approach of encouraging collaboration with other approaches to synchronizing shared interfaces. We describe how we have revised the technologies to encourage collaboration and to reflect design suggestions made by the children themselves. (Also cross-referenced as UMIACS-TR-99-76) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Single Display Groupware. Benjamin B. Bederson. Jason Stewart. Allison Druin. November 1999.
We discuss a model for supporting collaborative work between people that are physically close to each other. We call this model Single Display Groupware (SDG). In this paper, we describe the model, comparing it to more traditional remote collaboration. We describe the requirements that SDG places on computer technology, and our understanding of the benefits and costs of SDG systems. Finally, we describe a prototype SDG system that we built and the results of a usability test we ran with 60 elementary school children. Through participant observation, video analysis, program instrumentation, and an informal survey, we discovered that the SDG approach to collaboration has strong potential. Children overwhelmingly prefer two mice to one mouse when collaborating with other children. We identified several collaborative styles including a dominant partner, independent simultaneous use, a mentor/mentee relationship, and active collaboration. (Also cross-referenced as UMIACS-TR-99-75) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
IMPACTing SHOP: Foundations for integrating HTN Planning and. Hector Munoz-Avila. Juergen Dix. Dana S. Nau. Yue Cao. November 1999.
AI planning systems typically require that the state of the world be locally accessible. We call this the centralized state requirement. Furthermore, the state is described in a special representation language, mostly related to first-order logic. We refer to this as the uniform representation requirement. Relevant data from other sources must therefore be translated by hand into this language, stored in main memory and cannot be accessed automatically or as needed. These requirements, however, do not hold in many real-world domains. Information about the state may be distributed in several locations, each of which may have its own representation language. We address this problem by using a recently developed architecture for a Multi-Agent System, IMPACT, and its code-call mechanism. Within IMPACT queries and requests to arbitrary legacy and/or specialized data structures or external databases may be executed. We show in this paper how to combine the basic algorithm of a very efficient HTN planner, SHOP, with the code-call mechanism of IMPACT. This opens the way for SHOP to access real-world data and to base the planning process on external databases. We show that SHOP is sound and complete w.r.t. this extended data access. This technical report has been updated and revised and is available full-text/online as CS-TR-4100. (Also cross-referenced as UMIACS-TR-99-74) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Parameterized Modeling and Scheduling of Dataflow Graphs. Bishnupriya Bhattacharya. Shuvra S. Bhattacharyya. November 1999.
Dataflow has proven to be an attractive computational model for programming DSP applications. A restricted version of dataflow, called Synchronous Dataflow (SDF) is particularly well-suited for modeling a large class of signal processing applications, as it offers strong formal properties and compile-time predictability. However, the SDF model does not allow data-dependent flow of control or dynamically varying communication patterns between functional modules. This results in limited expressive power. Consequently, a variety of extensions to SDF have been developed, where the objective is to provide increased expressive power, while maintaining a significant part of the compile-time predictability of SDF. In this report, we propose a parameterized dataflow framework that can be applied as a meta-modeling technique to an arbitrary dataflow model that satisfies certain requirements, to further increase its expressive power. For clarity, we focus on synchronous dataflow, and develop the precise semantics of parameterized synchronous dataflow (PSDF). We propose a formal framework for the PSDF model, and introduce the concept of local synchrony, which is a condition that must be satisfied for consistent execution of PSDF specifications. From our experience, it appears that the PSDF model significantly increases the expressive power of pure SDF, while maintaining many of the desirable properties of SDF, like low-overhead scheduling (geared towards software synthesis in embedded systems). We develop techniques for implementing the operational semantics of PSDF that allows efficient quasi-static scheduling of a class of PSDF specifications. University of Maryland Institute for Advanced Computer Studies, Department of Electrical Engineering, University of Maryland, Department of Coomputer Science, University of Maryland,
A Convex Optimization Approach for Addressing Storage-Communication. Radha Poovendran. November 1999.
In Eurocrypt'99, Canetti, Malkin, and Nissim [1], presented a new tree based key distribution algorithm that required sublinear storage of keys while preserving logarithmic update communication as functions of the group size. The results in are known to be the first results presenting the sub-linear storage among the family of tree based key distribution schemes. The question of whether this storage was the possible optimal value while keeping the communication as logarithmic was posed as a problem. We show that the storage-communication tradeoff can be formulated as a convex optimization problem in terms of the size of the minimal storage parameter defined in. In particular, we show that the optimal solution is parameterizable by the ratio of the communication and storage costs, the degree of the tree, and the group size. Using this design triplet, we show that not only the results in [1] but also the results of the basic scheme of Wallner, Harder, and Agee [2] can be derived as specific Pareto optimal points for specific choice of the triplet. We also present an exact design procedure for feasibility testing and constructing optimal key distribution tree of the type in. We also show that if the communication and the storage are equally weighted, then the optimal value for storage and communication grows as square root of group size , a value noted in [1]. Department of Computer Science, University of Maryland,
Scheduling Jobs Before Shut Down. Vincenzo Liberatore. December 1999.
Distributed systems execute background or alternative jobs while waiting for data or requests to arrive from another processor. In those cases, the following shut-down scheduling problem arises: given a set of jobs of known processing time, schedule them on m machines so as to maximize the total weight of jobs completed before an initially unknown deadline. We will present optimally competitive deterministic and randomized algorithms for shut-down scheduling. Our deterministic algorithm is parameterized by the number of machines m. Its competitive ratio increases as the number of machines decreases, but it is optimal for any given choice of m. Such family of deterministic algorithm can be translated into a family of randomized algorithms that use progressively less randomization and that are optimal for the given amount of randomization. Hence, we establish a precise trade-off between amount of randomization and competitive ratios. Distributed systems execute background or alternative jobs while waiting for data or requests to arrive from another processor. In those cases, the following shut-down scheduling problem arises: given a set of jobs of known processing time, schedule them on m machines so as to maximize the total weight of jobs completed before an initially unknown deadline. We will present optimally competitive deterministic and randomized algorithms for shut-down scheduling. Our deterministic algorithm is parameterized by the number of machines m. Its competitive ratio increases as the number of machines decreases, but it is optimal for any given choice of m. Such family of deterministic algorithm can be translated into a family of randomized algorithms that use progressively less randomization and that are optimal for the given amount of randomization. Hence, we establish a precise trade-off between amount of randomization and competitive ratios. (Also cross-referenced as UMIACS-TR-99-72) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
SHOE: A Knowledge Representation Language for Internet Applications. Jeff Heflin. James Hendler. Sean Luke. October 1999.
It is our contention that the World Wide Web poses challenges to knowledge representation systems that fundamentally change the way we should design KR languages. In this paper, we describe the Simple HTML Ontology Extensions (SHOE), a KR language which allows web pages to be annotated with semantics. We present a formalism for the language and discuss the features which make it well suited for the Web. We describe the syntax and semantics of this language, and discuss the differences from traditional KR systems that make it more suited to modern web applications. We also describe some generic tools for using the language and demonstrate its capabilities by describing two prototype systems that use it. We also discuss some future tools currently being developed for the language. The language, tools, and details of the applications are all available on the World Wide Web at http://www.cs.umd.edu/projects/plus/SHOE. (Also cross-referenced as UMIACS-TR-99-71) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Security Infrastructure for Mobile Transactional Systems. Peter J. Keleher. Bobby Bhattacharjee. Kuo-Tung Kuo. Ugur Cetintemel. April 2000.
In this paper, we present an infrastructure for providing secure transactional support for mobile databases. Our infrastructure protects against external threats - malicious actions by nodes not authorized to access the data. The major contribution of this paper, however, is to classify and present algorithms to protect against internal security threats. Internal threats are malicious ac-tions by authenticated nodes that misrepresent protocol specific information. We quantify the cost of our security mechanisms in context of Deno: a system that supports object replication in a transactional framework for mobile and weakly-connected environments. Our results show that protecting against internal threats comes at a cost, but the marginal cost for protecting against larger cliques of malicious insiders is low. However, even with all the security mechanisms in place, our system commits updates over 50% faster than systems that depend on the Read-once Write-all commit protocol. Lastly, we present results from a probabilistic version of our algorithm that has several orders of magnitude lower computation cost than the traditional public-key based schemes. (Also cross-referenced as UMIACS-TR-2000-19) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
ViPEr-HiSS: A Case for Storage Design Tools. Leana Golubchik. Joseph Dunnick. Jeffrey K. Hollingsworth. October 1999.
The viability of large-scale multimedia applications, depends on the performance of storage systems. Providing cost-effective access to vast amounts of video, image, audio, and text data, requires (a) proper configuration of storage hierarchies as well as (b) efficient resource management techniques at all levels of the storage hierarchy. The resulting complexities of the hardware/software co-design in turn contribute to difficulties in making accurate predictions about performance, scalability, and cost-effectiveness of a storage system. Moreover, poor decisions at design time can be costly and problematic to correct in later stages of development. Hence, measurement of systems after they have been developed is not a desirable approach to predicting their performance. What is needed is the ability to evaluate the system's design while there are still opportunities to make corrections to fundamental design flaws. In this paper we describe the framework of ViPEr-HiSS, a tool which facilitates design, development, and subsequent performance evaluation of designs of multimedia storage hierarchies by providing mechanisms for relatively easy experimentation with (a) system configurations as well as (b) application- and media-aware resource management techniques. (Also cross-referenced as UMIACS-TR-99-69) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science , University of Maryland,
PETS: A Personal Teller of Stories. Jaime Montemayor. Allison Druin. Jim Hendler. November 1999.
Let us start by reading a story written by a seven year old child, entitled Michelle. "There once was a robot named Michelle. She was new in the neighborhood. She was HAPPY when she first came, thinking she would make friends. But it was the opposite. Other robots threw rocks and sticks. She was SAD. Now no one liked her. One day she was walking down a street, a huge busy one, when another robot named Rob came up and ask [sic] if she wanted to have a friend. She was SCARED at first but then realized that she was HAPPY. The other robots were ANGRY but knew that they had learned their lesson. Michelle and Rob lived HAPPILY ever after. No one noticed the dents from rocks that stayed on Michelle." (Druin, Research notes, August 1998) This is just one of many stories that children have written with the help of PETS (Druin et al. 1999a). The author of Michelle did not just write this moving story; she is also an integral member of the team that built our robots. As you read on, PETS will be further described. Our motivations behind building such an interactive robotic pet will also be discussed. In addition, the process of how we made this robotic technology with our team of adults and six children will be introduced. And with this, we will present cooperative inquiry (Druin 1999a), the methodology that we embrace as we discover insights about technology, education, science, engineering, and art. Finally, this chapter will close with reflections on what was learned from on-going research effort. (Also cross-referenced as UMIACS-TR-99-67) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Efficient Preconditioning of the Linearized Navier-Stokes Equations}. David Silvester. Howard Elman. David Kay. Andrew Wathen. October 1999.
We outline a new class of robust and efficient methods for solving subproblems that arise in the linearization and operator splitting of Navier-Stokes equations. We describe a very general strategy for preconditioning that has two basic building blocks; a multigrid V-cycle for the scalar convection-diffusion operator, and a multigrid V-cycle for a pressure Poisson operator. We present numerical experiments illustrating that a simple implementation of our approach leads to an effective and robust solver strategy in that the convergence rate is independent of the grid and the time-step, and only deteriorates very slowly as the Reynolds number is increased. (Also cross-referenced as UMIACS-TR-99-66) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Active Logics: A Unified Formal Approach to Episodic Reasoning. Jennifer Elgot-Drapkin. Sarit Kraus. Michael Miller. Madhura Nirkhe. Donald Perlis. October 1999.
Artificial intelligence research falls roughly into two categories: formal and implementational. This division is not completely firm: there are implementational studies based on (formal or informal) theories (e.g., CYC, SOAR, OSCAR), and there are theories framed with an eye toward implementability (e.g., predicate circumscription). Nevertheless, formal/theoretical work tends to focus on very narrow problems (and even on very special cases of very narrow problems) while trying to get them ``right'' in a very strict sense, while implementational work tends to aim at fairly broad ranges of behavior but often at the expense of any kind of overall conceptually unifying framework that informs understanding. It is sometimes urged that this gap is intrinsic to the topic: intelligence is not a unitary thing for which there will be a unifying theory, but rather a ``society'' of subintelligences whose overall behavior cannot be reduced to useful characterizing and predictive principles. Here we describe a formal architecture that is more closely tied to implementational constraints than is usual for formalisms, and which has been used to solve a number of commonsense problems in a unified manner. In particular, we address the issue of formal, integrated, and longitudinal reasoning: inferentially-modeled behavior that incorporates a fairly wide variety of types of commonsense reasoning within the context of a single extended episode of activity requiring keeping track of ongoing progress, and altering plans and beliefs accordingly. Instead of aiming at optimal solutions to isolated, well-specified and temporally narrow problems, we focus on satisficing solutions to under-specified and temporally-extended problems, much closer to real-world needs. We believe that such a focus is required for AI to arrive at truly intelligent mechanisms with the ability to behave effectively over considerably longer time periods and range of circumstances than is common in AI today. While this will surely lead to less elegant formalisms, it also surely is requisite if AI is to get fully out of the blocks-world and into the real world. (Also cross-referenced as UMIACS-TR-99-65) University of Maryland Institute for Advaced Computer Studies, Department of Computer Science, University of Maryland,
On Orthogonalization in the Inverse Power Method. G. W. Stewart. September 1999.
When the inverse power method is used to compute eigenvectors of a symmetric matrix corresponding to close eigenvalues, the computed eigenvectors may not be orthogonal. The cure for the problem is to orthogonalize the vectors using the Gram--Schmidt algorithm. In this note it is shown that the orthogonalization process does not cause the quality of the eigenvectors to deteriorate. Also cross-referenced as UMIACS-TR-99-64 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Evolving a Set of Techniques for OO Inspections. Forrest Shull. Guilherme H. Travassos. Jeffrey Carver. Victor R. Basili. October 1999.
Inspecting OO designs is an important way of ensuring the quality of software under development. When high-level design activities are finished, the design documents can be inspected to verify whether they are consistent among themselves and whether the software requirements were correctly and completely captured. This paper discusses some issues regarding the definition and application of reading techniques (i.e. procedural guidelines that can be given to inspectors) to inspect high-level OO design documents. An initial set of OO Reading Techniques and their experimental evaluation is described. A method for evaluating the reading techniques in more detail, i.e. Observational Techniques, is then presented, and experiences with its use are discussed. Through these discussions, we show how the reading techniques have evolved in response to empirical evidence (both qualitative and quantitative) regarding their use in practice. The complete and current set of techniques can be found in the appendices. (Also cross-referenced as UMIACS-TR-99-63) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Generating Efficient Stack Code for Java. Tatiana Shpeisman. Mustafa Tikir. October 1999.
Optimizing Java byte code is complicated by the fact that it uses a stack-based execution model. Changing the intermediate representation from the stack-based to the register-based one brings the problem of Java byte code optimizations into well-studied domain of compiler optimizations for register-based codes. In this paper we describe the technique to convert a register-based code into the Java byte code. The code generation techniques developed for the stack-based computers are not directly applicable to this problem as the comparative cost of the local memory and stack manipulation instructions in JVM is quite different from that in the stack-based computers. Naive verbose translation of the register-based code into the Java byte code produces the code with many redundant store and load instructions. The tool that we have developed allows to remove 90-100 \% of the stores to the local (i.e., non-global) variables. It produces the Java byte code that is slightly faster and shorter than the original byte code even when no optimizations except for register allocation are performed on the register-based code. Department of Computer Science, University of Maryland,
Secure Agents. Piero Bonatti. Sarit Kraus. V.S.Subrahmanian. October 1999.
With the rapid proliferation of software agents, there comes an increased need for agents to ensure that they do not provide data and/or services to unauthorized users. We first develop an abstract definition of what it means for an agent to preserve data/action security. Most often, this requires an agent to have knowledge that is impossible to acquire --- hence, we then develop approximate security checks that take into account, the fact that an agent usually has incomplete/approximate beliefs about other agents. We develop two types of security checks --- static ones that can be checked prior to deploying the agent, and dynamic ones that are executed at run time. We prove that a number of these problems are undecidable, but under certain conditions, they are decidable and (our definition of) security can be guaranteed. Finally, we propose a language within which the developer of an agent can specify her security needs, and present provably correct algorithms for static/dynamic security verification. (Also cross-refernced as UMIACS-TR-99-62) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Automatic Deployment of Application-Specific Metadata and Code in MOCHA. Manuel Rodriguez. Nick Roussopoulos. December 1999.
Database middleware systems require the deployment of application-specific data types and query operators to the servers and clients in the system. Existing middleware solutions rely on developers and system administrators to port and manually install all this application-specific functionality to all sites in the system. This approach cannot scale to an environment in which there are hundreds of data sources, such as those accessed by the Web and even more custom-tailored applications, since the complexity and the cost involved in maintaining a code base system-wide are enormous. This paper describes a novel metadata-driven framework designed to automate the deployment of all application-specific functionality used by a middleware system. We used Java and XML to implement this framework in MOCHA, a middleware system developed at the University of Maryland. We first present the kind of services, metadata elements and software tools used in MOCHA to automate code deployment. Then, we describe how the features of MOCHA simplify the administration and reduce the management cost of a middleware system in a large scale environment. (Also cross-refernced as UMIACS-TR-99-61) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Benefits of Simultaneous over Sequential Menus as Task. H. Hochheiser. N. Kositsyna. G. Ville. B. Shneiderman. September 1999.
To date, experimental comparisons of menu layouts have concentrated on variants of hierarchical structures of sequentially presented menus. Simultaneous menus - layouts which present multiple active menus on a screen at the same time - are an alternative arrangement that may be useful in many web design situations. This paper describes an experiment involving a between-subject comparison of simultaneous menu and their traditional sequential counterparts. Twenty experienced web users used either simultaneous or sequential menus in a standard web browser to answer questions based on US Census data. For novice users performing simple tasks the simplicity of sequential menus appears to be helpful, but for most tasks and most users there is good evidence to believe that simultaneous menus speed performance and improve satisfaction. Design improvements can amplify the benefits of simultaneous menu layouts. (Also cross-referenced asUMIACS-TR-99-60) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Negative Cycle Detection in Dynamic Graphs. Nitin Chandrachoodan. Shuvra S.Bhattacharyya. K.J.Ray Liu. September 1999.
We examine the problem of detecting negative cycles in a dynamic graph, which is a fundamental problem that arises in electronic design automation and systems theory. Previous approaches used for this have tried to modify Dijkstra's algorithm since it is the fastest known Single-Source Shortest Path algorithm. We introduce the concept of {\em batch mode} negative cycle detection, in which a graph changes over time, and negative cycle detection needs to be done periodically. Such scenarios arise, for example, during iterative design space exploration for hardware and software synthesis. We present an algorithm for this problem, based on the Bellman-Ford algorithm, which outperforms previous approaches. We also show that this technique leads to very fast algorithms for the computation of the maximum-cycle mean (MCM) of a graph, especially for a certain form of {\em sparse graph}. Such sparseness often occurs in practice, as demonstrated for example by the ISCAS 89/93 benchmarks. We present experimental results that demonstrate the advantages of our batch-processing techniques, and illustrate their application to design-space exploration by developing an automated local-search technique for multiple-voltage scheduling of iterative data-flow graphs. (Also cross-referenced as UMIACS-TR-99-59) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Software synthesis and code generation for signal processing systems. S. S. Bhattacharyya. R. Leupers. P. Marwedel. September 1999.
No abstract submitted (Also cross-referenced as UMIACS-TR-99-57 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The CBP parameter --- a useful annotation to aid SDF compilers. S. S. Bhattacharyya. P. K. Murthy. September 1999.
The role of software is becoming increasingly important in the implementation of DSP applications. As this trend intensifies, and the complexity of applications escalates, we are seeing an increased need for automated tools to aid in the development of DSP software. This paper reviews the state of the art in programming language and compiler technology for DSP software implementation. In particular, we review techniques for high level, block-diagram-based modeling of DSP applications; the translation of block diagram specifications into efficient C programs using global, target-independent optimization techniques; and the compilation of C programs into streamlined machine code for programmable DSP processors, using architecture-specific and retargetable back-end optimizations. In our review, we also point out some important directions for further investigation. (also cross-referenced as UMIACS-TR-99-56) University of Maryland Institute for Advanced Computer Syudies, Department of Computer Science, University of Maryland,
XMT-M: A Scalable Decentralized Processor. Efraim Berkovich. Joseph Nuzman. Manoj Franklin. Bruce Jacob. Uzi Vishkin. September 1999.
A defining challenge for research in computer science and engineering has been the ongoing quest for reducing the completion time of a single computation task. Even outside the parallel processing communities, there is little doubt that the key to further progress in this quest is to do parallel processing of some kind. A recently proposed parallel processing framework that spans the entire spectrum from (parallel) algorithms to architecture to implementation is the explicit multi-threading (XMT) framework. This framework provides: (i) simple and natural parallel algorithms for essentially every general-purpose application, including notoriously difficult irregular integer applications, and (ii) a multi-threaded programming model for these algorithms which allows an ``independence-of-order'' semantics: every thread can proceed at its own speed, independent of other concurrent threads. To the extent possible, the XMT framework uses established ideas in parallel processing. This paper presents XMT-M, a microarchitecture implementation of the XMT model that is possible with current technology. XMT-M offers an engineering design point that addresses four concerns: buildability, programmability, performance, and scalability. The XMT-M hardware is geared to execute multiple threads in parallel on a single chip: relying on very few new gadgets, it can execute parallel threads without busy-waits! Existing code can be run on XMT-M as a single thread without any modifications, thereby providing backward compatibility for commercial acceptance. Simulation-based studies of XMT-M demonstrate considerable improvements in performance relative to the best serial processor even for small, and therefore practical, input sizes. (Also cross-referenced as UMIACS-TR-99-55) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Cost Models for Query Processing Strategies in the Active Data. Chialin Chang. September 1999.
Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we describe analytical models to predict the average computation, I/O and communication operation counts of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular d-dimensional array. We validate these models for various synthetic datasets and for several driving applications. Also cross-referenced as UMIACS-TR-99-54 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hashing Technique: A New Index Method for High Dimensional Data. Zhexuan Song. Nick Roussopoulos. September 1999.
When dimension goes high, sequential scan processing becomes more efficient than most index-based query. In this paper, we propose a new index method for high-dimensional data spaces. This method is based on hashing technique. The basic idea is: First find a hashing function which puts the given d-dimensional space data into a d'-dimensional buckets where d' << d. Then, we use existing index techniques to manage those buckets. We later define some properties of a good hashing function and give four hashing functions. To demonstrate the efficiency of our idea, we experimentally compared our algorithms with sequential scan and Pyramid-Techniques. The results demonstrate that this method outperforms others for skewed data set. It always beats the sequential scan by using only half of elapsed time for range query. However if the data has uniform distribution, Pyramid-Technique is still the best method. Department of Computer Science, University of Maryland,
The Role of Children in the Design Technology. Allison Druin. September 1999.
Children play games, chat with friends, tell stories, study history or math, and today this can all be done supported by new technologies. From the Internet to multimedia authoring tools, technology is changing the way children live and learn. As these new technologies become ever more critical to our children's lives, we need to be sure these technologies support children in ways that make sense for them as young learners, explorers, and avid technology users. This may seem of obvious importance, because for almost 20 years the HCI community has pursued new ways to understand users of technology. However, with children as users, it has been difficult to bring them into the design process. Children go to school for most of their days; there are existing power structures, biases, and assumptions between adults and children to get beyond; and children, especially young ones have difficulty in verbalizing their thoughts. For all of these reasons, a child's role in the design of new technology has historically been minimized. Based upon a survey of the literature and my own research experiences with children, this paper defines a framework for understanding the various roles children can have in the design process, and how these roles can impact technologies that are created. (Also cross-referenced as UMIACS-TR-99-53) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Temporal Agent Programs. J. Dix. S. Kraus. V.S. Subrahmanian. September 1999.
The ``agent program'' framework introduced by Eiter, Subrahmanian and Pick (\textbf{Artificial Intelligence, 108(1-2), 1999}), supports developing agents on top of arbitrary legacy code. Such agents are continuously engaged in an \emph{``event occurs, think, act, event occurs''} cycle. However, this framework has two major limitations: (1) all actions are assumed to have no duration, and (2) all actions are taken now, but cannot be \emph{scheduled for the future}. In this paper, we present the concept of a ``temporal agent program'' (\tap for short) and show that using {\tap}s, it is possible to build agents on top of legacy code that can reason about the past and about the future, and that can make temporal commitments for the future now. We develop a formal semantics for such agents, extending the concept of a status set proposed by Eiter et al., and develop algorithms to compute the status sets associated with temporal agent programs. Last, but not least, we show how {\tap}s support classical negotiation methods (as well as some new ones) and classical auction methods (as well as some new ones). (Also cross-referenced as UMIACS-TR-99-51) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Probabilistic Agent Programs. Juergen Dix. Mirco Nanni. VS Subrahmanian. September 1999.
Agents are small programs that autonomously take actions based on changes in their environment or ``state.'' Over the last few years, there have been an increasing number of efforts to build agents that can interact and/or collaborate with other agents. In one of these efforts, Eiter, Subrahmanian amd Pick (AIJ, 108(1-2), pages 179-255) have shown how agents may be built on top of legacy code. However, their framework assumes that agent states are completely determined, and there is no uncertainty in an agent's state. Thus, their framework allows an agent developer to specify how his agents will react when the agent is 100\% sure about what is true/false in the world state. In this paper, we propose the concept of a \emph{probabilistic agent program} and show how, given an arbitrary program written in any imperative language, we may build a declarative ``probabilistic'' agent program on top of it which supports decision making in the presence of uncertainty. We provide two alternative semantics for probabilistic agent programs. We show that the second semantics, though more epistemically appealing, is more complex to compute. We provide sound and complete algorithms to compute the semantics of \emph{positive} agent programs. (Also cross-referenced as UMIACS-TR-99-50) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Meta Agent Programs. Juergen Dix. V.S. Subrahmanian. George Pick. September 1999.
There are numerous applications where an agent \aga needs to reason about the beliefs of another agent, as well as about the actions that other agents may take. In Eiter/Subrahmanian/Pick the concept of an agent program is introduced, and a language within which the operating principles of an agent can be declaratively encoded on top of imperative data structures is defined. In this paper we first introduce certain belief data structures that an agent needs to maintain. Then we introduce the concept of a \emph{Meta Agent Program} (\map), that extends the framework of Eiter/Subrahmanian/Pick, so as to allow agents to perform metareasoning. We build a formal semantics for \map{s}, and show how this semantics supports not just beliefs agent a may have about agent b's state, but also beliefs about agents b's beliefs about agent c's actions, beliefs about b's beliefs about agent c's state, and so on. Finally, we provide a translation that takes any \map as input and converts it into an agent program such that there is a one-one correspondence between the semantics of the \map and the semantics of the resulting agent program. This correspondence allows an implementation of \map{s} to be built on top of an implementation of agent programs. Also cross-referenced as UMIACS-TR-99-49 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Nonmonotonic Reasoning: Towards efficient calculi and implementations. Juergen Dix. Ulrich Furbach. Ilkka Niemelae. September 1999.
In this paper we do not want to give a detailed overview of the various formalizations of nonmonotonic reasoning that have evolved (those can be found in various textbooks), but we want to give an overview of the main computational techniques and methods leading to implementions of nonmonotonic reasoning. We first introduce the main nonmonotonic logics: \emph{Default Logic}, \emph{Circumscription} and \emph{Autoepistemic Logic}. We also consider the abstract approach of Kraus, Lehmann and Magidor to associate with any reasoning system an \emph{abstract consequence relation}. Then we investigate universal methods for computing in general nonmonotonic logics. We do this with a special eye on the underlying complexity and show how this lead to automated theorem proving in such logics. Finding efficient computation mechanisms for the logics introduced in the former section is the aim of the next Section. There we consider techniques that originated from automated reasoning in first-order predicate calculus. We depict how these techniques can be applied for disjunctive logic programming with programs with variables but only limited use of negation. In particular, we handle \ie{GCWA} as a basis for nonmonotonic negation therein. We then give a declarative overview on nonmonotonicity in logic programming. We introduce (nonmonotonic) semantics of logic programs with negation and disjunction, notably the well-founded and the stable semantics and their extensions to programs containing disjunction--- they constitute the most important semantics and are in close relation to the logics introduced in the next Section. While in we considered in a former section techniques that can be successfully applied for programs with variables and only limited use of negation, we also treat propositional programs with full negation and disjunction. In particular, we provide implementations of \mbox{D-WFS}\Index{D-WFS} and \ie{D-ST ABLE} in polynomial space. We end with a section where we consider the problem of finding good benchmarks to test and compare nonmonotonic systems against. Also cross-referenced as UMIACS-TR-99-48 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Explaining Updates by minimal sums. Juergen Dix. Karl Schlechta. September 1999.
Human reasoning about developments of the world involves always an assumption of \emph{inertia}. We discuss two approaches for formalizing such an assumption, based on the concept of an \emph{explanation}: \emph{(1)} there is a general preference relation given on the set of all explanations, \emph{(2)} there is a notion of a \emph{distance} between models and explanations are \emph{preferred} if their sum of distances is minimal. We show exactly under which conditions the converse is true as well and therefore both approaches are equivalent modulo these conditions. Our main result is a general representation theorem in the spirit of Kraus, Lehmann and Magidor. Also cross-referenced as UMIACS-TR-99-47 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A General Theory of Confluent rewriting Systems for Logic Programming. Juergen Dix. Mauricio Osorio. September 1999.
Recently, Brass and Dix showed (\emph{Journal of Automated Reasoning} \textbf{20(1)}, 1998) that the wellfounded semantics WFS can be defined as a confluent calculus of transformation rules. This lead not only to a simple extension to disjunctive programs (\emph{Journal of Logic Programming} \textbf{38(3)}, 1999), but also to a new computation of the wellfounded semantics which is \emph{linear} for a broad class of programs. We take this approach as a starting point and generalize it considerably by developing a general theory of \emph{Confluent LP-Systems} $\cfs$. Such a system $\cfs$ is a rewriting system on the set of all logic programs over a fixed signature $\Lang$ and it induces in a natural way a canonical semantics. Moreover, we show four important applications of this theory: \emph{(1) most of the well-known semantics are induced by confluent LP-systems}, \emph{(2) there are many more transformation rules that lead to confluent LP-systems}, \emph{(3) semantics induced by such systems can be used to model aggregation}, \emph{(4) the new systems can be used to construct interesting counterexamples to some conjectures about the space of well-behaved semantics}. Also cross-referenced as UMIACS-TR-99-46 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Striping Doesn't Scale: How to Achieve Scalability. ChengFu Chou. Leana Golubchik. John C.S. Lui. September 1999.
Multimedia applications place high demands for QoS, performance, and reliability on storage servers and communication networks. These, often stringent, requirements make design of cost-effective and scalable continuous media (CM) servers difficult. In particular, the choice of data placement techniques can have a significant effect on the scalability of the CM server and its ability to utilize resources efficiently. In the recent past, a great deal of work has focused on ``wide'' data striping as a technique which ``implicitly'' solves load balancing problems; although, it does suffer from multiple shortcomings. Another approach to dealing with load imbalance problems is replication. The main focus of this paper is a study of scalability characteristics of CM servers as a function of tradeoffs between striping and replication. More specifically, striping is a good approach to load balancing while replication is a good approach to ``isolating'' nodes from being dependent on other system resources. The appropriate compromise between the degree of striping and the degree of replication is key to the design of a scalable CM server. This is the topic of our work. Also cross-referenced as UMIACS-TR-99-45 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On Fault Location in Networks by Passive Testing. Raymond E. Miller. Khaled A. Arisha. August 1999.
In this paper, we employ a variant of the communicating finite state machine (CFSM) model for networks to investigate fault detection and location using passive testing. First, we introduce the concept of passive testing, then we introduce the model with necessary assumptions and justification. Then, the model for the observer process is described and a 3-node case is studied to show how fault location information can be deduced. Extending this result, we propose a multiple node-cut approach for a general network, applying our technique for fault detection and location. An abstraction of a node-cut shows how the 3-node case can be used in the general case. We then illustrate our technique through a simulation of a practical X.25 example. Finally future extensions and potential trends are discussed. Department of Computer Science, University of Maryland,
Universal Usability: Pushing Human-Computer Interaction Research to. Ben Shneiderman. July 1999.
"I feel... an ardent desire to see knowledge so disseminated through the mass of mankind that it may...reach even the extremes of society: beggars and kings." -- Thomas Jefferson, Reply to American Philosophical Society, 1808 In a fair society, all individuals would have equal opportunity to participate in, or benefit from, the use of computer resources regardless of race, sex, religion, age, disability, national origin or other such similar factors. -- ACM Code of Ethics Position Paper for National Science Foundation & European Commission meeting on human-computer interaction research agenda, June 1-4, 1999, Toulouse, France. To be published in book form. Also cross-referenced as UMIACS-TR-99-17 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Supporting Creativity with Advanced Information-Abundant User. Ben Shneiderman. June 1999.
A challenge for human-computer interaction researchers and user interface designers is to construct information technologies that support creativity. This ambitious goal can be attained if designers build on an adequate understanding of creative processes. This paper describes a model of creativity, the four-phase genex framework for generating excellence: - Collect: learn from previous works stored in digital libraries, the web, etc. - Relate: consult with peers and mentors at early, middle and late stages - Create: explore, compose, discover, and evaluate possible solutions - Donate: disseminate the results and contribute to the digital libraries, the web, etc. Within this integrated framework, there are eight activities that require human-computer interaction research and advanced user interface design. This paper concentrates on techniques of information visualization that support creative work by enabling users to find relevant information resources, identify desired items in a set, or discover patterns in a collection. It describes information visualization methods and proposes five questions for the future: generality, integration, perceptual foundations, cognitive principles, and collaboration. Also cross-referenced as UMIACS-TR-9942 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Improving Locality For Adaptive Irregular Scientific Codes. Hwansoo Han. Chau-Wen Tseng. September 1999.
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in poor performance. Researchers have shown that consecutively packing data relative to traversal order can significantly reduce cache miss rates by increasing spatial locality. In this paper, we investigate techniques for using partitioning algorithms to improve locality in adaptive irregular codes. We develop parameters to guide both geometric (RCB) and graph partitioning (METIS) algorithms, and develop a new graph partitioning algorithm based on hierarchical clustering (GPART) which achieves good locality with low overhead. We also examine the effectiveness of locality optimizations for adaptive codes, where connection patterns dynamically change at intervals during program execution. We use a simple cost model to guide locality optimizations when access patterns change. Experiments on irregular scientific codes for a variety of meshes show our partitioning algorithms are effective for static and adaptive codes on both sequential and parallel machines. Improved locality also enhances the effectiveness of LocalWrite, a parallelization technique for irregular reductions based on the owner computes rule. Also cross-referenced as UMIACS-TR-99-41 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Empirical Studies in Parallel Sorting. Evan Golub. May 1998.
I examine different parallel algorithms for sorting in rounds. Most of these algorithms use a graph to indicate the comparisons to be made. The primary difference between the algorithms is how these graphs are chosen. One uses graphs that are shown to exist using non-constructive techniques, several yield constructions of the required graphs, and one uses a randomized algorithm. The constructive algorithms would traditionally be preferred even though the processor requirements are higher. It is shown that the non- constructive algorithms can actually be used by generating the needed graphs using random number generators skewed appropriately. Department of Computer Science, University of Maryland,
Mathematical Modeling of Lateralization and Asymmetries in Cortical Maps. Svetlana Levitan. July 1999.
Recent experimental work in neurobiology has defined asymmetries and lateralization in the topographic maps found in mirror-image regions of the sensorimotor cerebral cortex. However, the mechanisms underlying these asymmetries are currently not established, and in some cases are quite controversial. In order to explore some possible causes of map asymmetry and lateralization, several neural network models of cortical map lateralization and asymmetries based on self-organizing maps are created and studied both computationally and theoretically. Activation levels of the elements in the models are governed by large systems of highly nonlinear ordinary differential equations (ODEs), where coefficients change with time and their changes depend on the activation levels. Special metrics for objective evaluation of simulation results (represented as paired receptive field maps) are introduced and analysed. The behavior of the models is studied when their parameters are varied systematically and also when simulated lesions are introduced into one of the hemispheric regions. Some very sharp transitions and other interesting phenomena have been found computationally. Many of these computationally observed phenomena are explained by theoretical analysis of total hemispheric activation in a simplified model. The connection between a bifurcation point of the system of ODEs and the sharp transition in the model's computational behavior is established. More general understanding of topographic map formation and changes under various conditions is achieved by analysis of activation patterns (i.e., $\omega$-limit sets of the above system of ODEs). This is the first mathematical model to demonstrate spontaneous map lateralization and asymmetries, and it suggests that such models may be generally useful in better understanding the mechanisms of cerebral lateralization. The mathematical analysis of the models leads to a better understanding of the mechanisms of self-organization in the topographic maps based on competitive distribution of activation and competitive learning. Also cross-referenced as UMIACS-TR-99-40 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Multigrid Method Enhanced by Krylov Subspace Iteration for Discrete. Howard C. Elman. Oliver G. Ernst. Dianne P. O'Leary. June 1999.
Standard multigrid algorithms have proven ineffective for the solution of discretizations of Helmholtz equations. In this work we modify the standard algorithm by adding GMRES iterations at coarse levels and as an outer iteration. We demonstrate the algorithm's effectiveness through theoretical analysis of a model problem and experimental results. In particular, we show that the combined use of GMRES as a smoother and outer iteration produces an algorithm whose performance depends relatively mildly on wave number and is robust for normalized wave numbers as large as two hundred. For fixed wave numbers, it displays grid-independent convergence rates and has costs proportional to number of unknowns. Also cross-referenced as UMIACS-TR-99-36 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Design of History Mechanisms and their Use in Collaborative Educational Simulations. Catherine Plaisant. Anne Rose. Gary Rubloff. Richard Salter. Ben Shneiderman. May 1999.
Reviewing past events has been useful in many domains. Videotapes and flight data recorders provide nvaluable technological help to sports coaches or aviation engineers. Similarly, providing learners with a readable recording of their actions may help them monitor their behavior, reflect on their progress, and experiment with revisions of their experiences. It may also facilitate active collaboration among dispersed learning communities. Learning histories can help students and professionals make more effective use of digital library searching, word processing tasks, computer-assisted design tools, electronic performance support systems, and web navigation. This paper describes the design space and discusses the challenges of implementing learning histories. It presents guidelines for creating effective implementations, and the design tradeoffs between sparse and dense history records. The paper also presents a first implementation of learning histories for a simulation-based engineering learning environment called SimPLE (Simulated Processes in a Learning Environment) for the case of a semiconductor fabrication module, and reports on early user evaluation of learning histories implemented within SimPLE. Also cross-referenced as UMIACS-TR-99-34 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Shared Memory Implementations of Synchronous Dataflow Specifications. Praveen K. Murthy. Shuvra S. Bhattacharyy. June 1999.
There has been a proliferation of block-diagram environments for specifying and prototyping DSP systems. These include tools from academia like Ptolemy [6], and commercial tools like SPW from Cadence Design Systems, and Cossap from Synopsys. The block diagram languages used in these environments are usually based on dataflow semantics because various subsets of dataflow have proven to be good matches for expressing and modeling signal processing systems. In particular, synchronous dataflow (SDF)[14] has been found to be a particularly good match for expressing multirate signal processing systems [5]. One of the key problems that arises during synthesis from an SDF specification is scheduling. Past work on scheduling [3] from SDF has focused on optimization of program memory and buffer memory. However, in [3], no attempt was made for overlaying or sharing buffers. In this paper, we formally tackle the problem of generating optimally compact schedules for SDF graphs, that also attempt to minimize buffering mem- ory under the assumption that buffers will be shared. This will result in schedules whose data memory usage is drastically lower than methods in the past have achieved. The method we use is that of lifetime analysis; we develop a model for buffer lifetimes in SDF graphs, and develop scheduling algorithms that attempt to generate schedules that minimize the maximum number of live tokens under the particular buffer lifetime model. We develop several efficient algorithms for extracting the relevant lifetimes from the SDF schedule. We then use the firstfit heuristic for packing arrays efficiently into memory. We report extensive experimental results on applying these techniques to several practical SDF systems, and show improvements that average 50% over previous techniques, with some systems exhibiting upto an 83% improvement over previous techniques. Also cross-referenced as UMIACS-TR-99-32 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Approximation Algorithms and Heuristics for the Dynamic Storage. Praveen K. Murthy. Shuvra S. Bhattacharyya. June 1999.
In this report, we look at the problem of packing a number of arrays in memory efficiently. This is known as the dynamic storage allocation problem (DSA) and it is known to be NP-complete. We develop some simple, polynomial-time approximation algorithms with the best of them achieving a bound of 4 for a sub-class of DSA instances. We report on an extensive experimental study on the FirstFit heuristic and show that the average-case performance on random instances is within 7% of the optimal value. Also cross-referenced as UMIACS-TR-99-31 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Symbiosis between Linear Algebra and Optimization. Dianne P. O'Leary. May 1999.
The efficiency and effectiveness of most optimization algorithms hinges on the numerical linear algebra algorithms that they utilize. Effective linear algebra is crucial to their success, and because of this, optimization applications have motivated fundamental advances in numerical linear algebra. This essay will highlight contributions of numerical linear algebra to optimization, as well as some optimization problems encountered within linear algebra that contribute to a symbiotic relationship. Also cross-referenced as UMIACS-TR-99-30 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Querying Very Large Multi-dimensional Datasets in ADR - Extended. Tahsin Kurc. Chialin Chang. Renato Ferreira. Alan Sussman. Joel Saltz. May 1999.
This paper addresses optimizing the execution of range queries into multi-dimensional datasets on distributed memory parallel machines within the Active Data Repository framework. ADR is an infrastructure that integrates storage, retrieval and processing of large multi-dimensional datasets on distributed memory parallel architectures with multiple disks attached to each node. We describe three potential strategies for efficient execution of such queries that employ different tiling and workload partitioning approaches. We evaluate scalability of these strategies for different application scenarios, varying both the number of processors and the input dataset size on a 128 processor IBM SP multicomputer. Also cross-referenced as UMIACS-TR-99-29 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Snap-Together Visualization: Coordinating Multiple Views to Explore. Chris North. Ben Shneiderman. June 1999.
Information visualizations with multiple coordinated views enable users to rapidly explore complex data and discover relationships. However, it is usually difficult for users to find or create the coordinated visualizations they need. Snap-Together Visualization allows users to coordinate multiple views that are customized to their needs. Users query their relational database and load results into desired visualizations. Then they specify coordinations between visualizations for selecting, navigating, or re-querying. Developers can make independent visualization tools 'snap-able' by including a few hooks. Also cross-referenced as UMIACS-TR-99-28 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Human-Computer Interaction Laboratory, University of Maryland,
Pixel Data Access for End-User Programming and Graphical Macros. Richard Potter. Ben Shneiderman. May 1999.
Pixel Data Access is an interprocess communication technique that enables users of graphical user interfaces to automate certain tasks. By accessing the contents of the display buffer, users can search for pixel representations of interface elements, and then initiate actions such as mouse clicks and keyboard entries. While this technique has limitations it offers users of current systems some unusually powerful features that are especially appealing in the area of end-user programming. Also cross-referenced as UMIACS-TR-99-27 University of Maryland Institute doe Advanced Computer Studies, Department of Computer Science, University of Maryland,
Architecture and Implementation of a Java Package for Multiple Input. Juan Pablo Hourcade. Benjamin B. Bederson. May 1999.
A major difficulty in writing Single Display Groupware (co-present collaborative) applications is getting input from multiple devices. We introduce MID, a Java package that addresses this problem and offers an architecture to access advanced events through Java. In this paper, we describe the features, architecture and limitations of MID. We also briefly describe an application that uses MID to get input from multiple mice: KidPad. Also cross-referenced as UMIACS-TR-99-26 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Analysis of the Rayleigh--Ritz Method for Approximating. Zhongxiao Jia. G. W. Stewart. May 1999.
This paper concerns the Rayleigh--Ritz method for computing an approximation to an eigenspace $\clx$ of a general matrix $A$ from a subspace $\clw$ that contains an approximation to $\clx$. The method produces a pair $(N, \tilde X)$ that purports to approximate a pair $(L, X)$, where $X$ is a basis for $\clx$ and $AX = XL$. In this paper we consider the convergence of $(N, \tilde X)$ as the sine $\epsilon$ of the angle between $\clx$ and $\clw$ approaches zero. It is shown that under a natural hypothesis\,---\,called the uniform separation condition\,---\,the Ritz pairs $(N, \tilde X)$ converge to the eigenpair $(L, X)$. When one is concerned with eigenvalues and eigenvectors, one can compute certain refined Ritz vectors whose convergence is guaranteed, even when the uniform separation condition is not satisfied. An attractive feature of the analysis is that it does not assume that $A$ has distinct eigenvalues or is diagonalizable. (Also cross-referenced as UMIACS-TR-99-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Jazz: An Extensible 2D+Zooming Graphics Toolkit in Java. Benjamin B. Bederson. Britt McAlister. May 1999.
Jazz is a new general-purpose toolkit that supports applications using zooming object-oriented 2D graphics. It is built entirely in Java using Java2D, and thus runs on all platforms that support Java 2. It supports zooming, internal cameras, and lenses in a similar style to Pad++, but does so in a general purpose manner without a specific focus on zooming. Jazz is primarily a "scenegraph" for 2D graphics that is analogous to Sun's Java3D and SGI's OpenInventor in their support for 3D scenegraphs. This paper describes Jazz and discusses the issues of using a scenegraph for 2D graphics. We discuss the Jazz architecture, and how applications can build on top of it. Also cross-referenced as UMIACS-TR-99-24 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Data Dissemination on the Web: Speculative and Unobtrusive. Vincenzo Liberatore. Brian D. Davison. May 1999.
The Web rapid growth results in heavier loads on servers/network and in increased latency experienced while retrieving Web documents. Internet traffic is further complicated by its burtiness, which complicates the design and allocation of network components. Bursty traffic alternates peak periods with lulls. The paper presents a framework that exploits idle periods in data traffic to satisfy future HTTP requests speculatively, opportunistically, and unobtrusively. Our proposal differs from previous schemes in that it is server-initiated and it is explicitly aware of current traffic loads (unobtrusive). This paper highlights several design trade-offs and details two issues: (1) server arbitration among several candidate documents, and (2) client/proxy caching. We present a theoretical analysis of arbitration, and we propose an integrated caching strategy for both requested and disseminated documents. Our approach is validated by extensive simulation on server logs, and substantial performance improvements are observed over pure on-demand strategies. (Also cross-referenced as UMIACS-TR-99-23) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Computation and Uses of the Semidiscrete Matrix Decomposition. Tamara G. Kolda. Dianne P. O'Leary. April 1999.
We derive algorithms for computing a semidiscrete approximation to a matrix in the Frobenius and weighted norms. The approximation is formed as a weighted sum of outer products of vectors whose elements are plus or minus $1$ or $0$, so the storage required by the approximation is quite small. We also present a related algorithm for approximation of a tensor. Applications of the algorithms are presented to data compression, filtering, and information retrieval; and software is provided in C and in Matlab. (Also cross-referenced as UMIACS-TR-99-22) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Adaptive Use of Iterative Methods in Predictor-Corrector Interior Point. Weichung Wang. Dianne P. O'Leary. April 1999.
In this work we devise efficient algorithms for finding the search directions for interior point methods applied to linear programming problems. There are two innovations. The first is the use of updating of preconditioners computed for previous barrier parameters. The second is an adaptive automated procedure for determining whether to use a direct or iterative solver, whether to reinitialize or update the preconditioner, and how many updates to apply. These decisions are based on predictions of the cost of using the different solvers to determine the next search direction, given costs in determining earlier directions. We summarize earlier results using a modified version of the OB1-R code of Lustig, Marsten, and Shanno, and we present results from a predictor-corrector code PCx modified to use adaptive iteration. If a direct method is appropriate for the problem, then our procedure chooses it, but when an iterative procedure is helpful, substantial gains in efficiency can be obtained. (Also cross-referenced as UMIACS-TR-99-21) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Kronos: A Java-Based Software System for the Processing and Retrieval. Zengyan Zhang. Joseph JaJa. David A. Bader. Satya Kalluri. Huiping Song. Nazmi Z El Saleous. Eric Vermote. John R. G. Townshend. March 1999.
At regional scales, satellite-based sensors are the primary source of information to study the earth's environment, as they provide the needed dynamic temporal view of the earth's surface. Raw satellite orbit data have to be processed and mapped into a standard projection to produce multitemporal data sets which can then be used for regional or global earth science studies, such as land cover dynamics, global carbon cycle, planetary-scale climate dynamics and deforestation. For a given sensor, different applications may require different processing chains with the same few core steps. Application dependent processing steps include atmospheric correction, spatial and temporal subsetting, and output image projection. However, the data sets that are currently available to the scientific community are generated using a predetermined processing chain in a fixed projection. Generating products that are different than the standard ones can be difficult and will result in at least a re-sampling step and hence some loss of accuracy. In this paper, we describe a software system Kronos for the generation of custom-tailored data products from the Advanced Very High Resolution Radiometer (AVHRR) sensor on board of the National Oceanic and Atmospheric Administration (NOAA) series polar orbiting satellites. Kronos allows the generation of a rich set of products that can be easily specified through a Java interface by scientists wishing to carry out earth system modeling or analysis based on Global Area Coverage (GAC) data from the AVHRR sensor. Kronos is based on a flexible methodology and consists of four major components: ingest and preprocessing, indexing and storage, search and processing engine, and a Java interface. We illustrate the power of our methodology by including a few special data products generated by Kronos. Also cross-referenced as UMIACS-TR-99-19 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Flexible User Profiles for Large Scale Data Delivery. Ugur Cetintemel. Michael J. Franklin. C. Lee Giles. March 1999.
Push-based data delivery requires knowledge of user interests for making scheduling, bandwidth allocation, and routing decisions. Such information is maintained as user profiles. We propose a new incremental algorithm for constructing user profiles based on monitoring and user feedback. In contrast to earlier approaches, which typically represent profiles as a single weighted interest vector, we represent user-profiles using multiple interest clusters, whose number, size, and elements change adaptively based on user access behavior. This flexible approach allows the profile to more accurately represent complex user interests. The approach can be tuned to trade off profile complexity and effectiveness, making it suitable for use in large-scale information filtering applications such as push-based WWW page dissemination. We evaluate the method by experimentally investigating its ability to categorize WWW pages taken from Yahoo! categories. Our results show that the method can provide high retrieval effectiveness with modest profile sizes and can effectively adapt to changes in users' interests. Also cross-referenced as UMIACS-TR-99-18 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Near-Optimal Parameters for Tikhonov and Other Regularization Methods. Dianne P. O'Leary. March 1999.
Choosing the regularization parameter for an ill-posed problem is an art based on good heuristics and prior knowledge of the noise in the observations. In this work we propose choosing the parameter, without a priori information, by approximately minimizing the distance between the true solution to the discrete problem and the family of regularized solutions. We demonstrate the usefulness of this approach for Tikhonov regularization and for an alternate family of solutions. Further, we prove convergence of the regularization parameter to zero as the standard deviation of the noise goes to zero. We also prove that the alternate family produces solutions closer to the true solution than the Tikhonov family when the noise is small enough. Also cross-referenced as UMIACS-TR-99-17 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
OSMA Software Program: Domain Analysis Guidebook. Victor R. Basili. Carolyn Seaman. Roseanne Tesoriero. Marvin V. Zelkowitz. December 1998.
Domain analysis is the process of identifying and organizing knowledge about a class of problems. This guidebook presents a method of performing experience domain analysis in software development organizations. The purpose of the guidebook is to facilitate the reader in characterizing two given development environments, applying domain analysis to model each, and then applying an evaluation process, based upon the Goal/Metric/Paradigm, to transfer a given development technology from one of the environments to the other. This guidebook describes this process and gives an example of its use within NASA. Also cross-referenced as UMIACS-TR-99-16 University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
Query Planning for Range Queries with User-defined Aggregation on. Chialin Chang. Tahsin Kurc. Alan Sussman. Joel Saltz. February 1999.
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is usually highly stylized, with the basic processing steps consisting of (1) retrieval of a subset of all available data in the input dataset via a range query, (2) projection of each input data item to one or more output data items, and (3) some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on shared-nothing architectures. In this paper we address query planning and execution strategies for range queries with user-defined processing. We evaluate three potential query planning strategies within the ADR framework under several application scenarios, and present experimental results on the performance of the strategies on a multiprocessor IBM SP2. (Also cross-refereced as UMIACS-TR-99-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Does Zooming Improve Image Browsing?. Tammara T.A. Combs. Benjamin B. Bederson. March 1999.
We describe an image retrieval system we built based on a Zoomable User Interface (ZUI). We also discuss the design, results and analysis of a controlled experiment we performed on the browsing aspects of the system. The experiment resulted in a statistically significant difference in the interaction between number of images (25, 75, 225) and style of browser (2D, ZUI, 3D). The 2D and ZUI browser systems performed equally, and both performed better than the 3D systems. The image browsers tested during the experiment include Cerious Software's Thumbs Plus, TriVista Technology's Simple LandScape and Photo GoRound, and our Zoomable Image Browser based on Pad++. Also cross-referenced as UMIACS-TR-99-14 University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
XJoin: Getting Fast Answers From Slow and Bursty Networks. Tolga Urhan. Michael J. Franklin. February 1999.
The combination of increasingly ubiquitous Internet connectivity and advances in heterogeneous and semi-structured databases has the potential to enable database-style querying over data from sources distributed around the world. Traditional query processing techniques, however, fail to deliver acceptable performance in such a scenario for two main reasons: First, they optimize for delivery of the entire query result, while on-line users would typically benefit from receiving initial results as quickly as possible. Second, slow or bursty delivery of data from remote sources can stall query execution, making the already inadequate batch-like behavior even worse. Both of these problems can be addressed using fully pipelined query execution. The symmetric hash join operator supports such pipelining, but it requires all base data and intermediate results to be memory-resident, which is unacceptable for complex queries over large datasets. In this paper we present a multi-threaded extension of the symmetric hash join, called XJoin, that can execute effectively with far less memory. By reactively scheduling background processing, XJoin hides intermittent delays in data arrival to produce more tuples earlier. XJoin includes a very efficient, on-the-fly algorithm for preventing duplicates from being created by its independently running threads. We have implemented the XJoin operator and added it to the PREDATOR Object-Relational DBMS. Using this implementation along with traces obtained by monitoring Internet data delivery, we show that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources. (Also cross-referenced as UMIACS-TR-99-13) University of Maryland Institute for Advanced Computer studies, Department of Computer Science, University of Maryland,
Visualizing Digital Library Search Results with Categorical and. Ben Shneiderman. David Feldman. Anne Rose. February 1999.
Digital library search results are usually shown as a textual list, with 10-20 items per page. Viewing several thousand search results at once on a two-dimensional display with continuous variables is a promising alternative. Since these displays can overwhelm some users, we created a simplified two-dimensional display that uses categorical and hierarchical axes, called hieraxes. Users appreciate the meaningful and limited number of terms on each hieraxis. At each grid point of the display we show a cluster of color-coded dots or a bar chart. Users see the entire result set and can then click on labels to move down a level in the hierarchy. Handling broad hierarchies and arranging for imposed hierarchies led to additional design innovations. We applied hieraxes to a digital video library used by middle school teachers and a legal information system. (Also cross-referenced as UMIACS-TR-99-12) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A New Method to Store and Retrieve Images. Zhexuan Song. Nick Roussopoulos. February 1999.
In this paper, we present a method to accelerate the speed of querying and retrieving images in database. First we change the storing method: pixels of an image are saved in Hilbert order instead of Row-wise order using in traditional method. Then after studying the property of Hilbert curve, we give a new algorithm which greatly reduce the data segment number on the disk. Although we have to retrieve more data than necessary, because the speed of sequential reading is much faster than random reading, we have about 10% improvement on the total query time which is showed in our simulation experiments. Department of Computer Science, University of Maryland,
Understanding Patterns of User Visits to Web Sites: Interactive. Harry Hochheiser. Ben Shneiderman. February 1999.
HTTP server log files provide Web site operators with substantial detail regarding the visitors to their sites. Interest in interpreting this data has spawned an active market for software packages that summarize and analyze this data, providing histograms, pie graphs, and other charts summarizing usage patterns. While useful, these summaries obscure useful information and restrict users to passive interpretation of static displays. Interactive starfield visualizations can be used to provide users with greater abilities to interpret and explore web log data. By combining two-dimensional displays of thousands of individual access requests, color and size coding for additional attributes, and facilities for zooming and filtering, these visualizations provide capabilities for examining data that exceed those of traditional web log analysis tools. We introduce a series of interactive starfield visualizations, which can be used to explore server data across various dimensions. Possible uses of these visualizations are discussed, and difficulties of data collection, presentation, and interpretation are explored. (Also cross-referenced as UMIACS-99-11) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Creating Creativity for Everyone: User Interfaces for Supporting. Ben Shneiderman. February 1999.
A challenge for human-computer interaction researchers and user interface designers is to construct information technologies that support creativity. This ambitious goal can be attained by building on an adequate understanding of creative processes. This paper offers the four-phase genex framework for generating excellence: - Collect: learn from previous works stored in digital libraries - Relate: consult with peers and mentors at early, middle and late stages - Create: explore, compose, and evaluate possible solutions - Donate: disseminate the results and contribute to the digital libraries Within this integrated framework, this paper proposes eight activities that require human-computer interaction research and advanced user interface design. A scenario about an architect illustrates the process of creative work within a genex environment. (Also cross-referenced as UMIACS-TR-9910) University of Maryland, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Probabilistic Temporal Databases, I: Algebra. Alex Dekhtyar. Robert Ross. V. S. Subrahmanian. January 1999.
Dyreson and Snodgrass have drawn attention to the fact that in many temporal database applications, there is often uncertainty present about the start time of events, the end time of events, the duration of events, etc. When the granularity of time is small (e.g. milliseconds), a statement such as "Packet p was shipped sometime during the first 5 days of January, 1998" leads to a massive amount of uncertainty (5 times 24 times 60 times 60 times 1000) possibilities. As noted by Zaniolo et. al., past attempts to deal with uncertainty in databases have been restricted to relatively small amounts of uncertainty in attributes. Dyreson and Snodgrass have taken an important first step towards solving this problem. In this paper, we first introduce the syntax of Temporal-Probabilistic (TP) relations and then show how they can be converted to an explicit, significantly more space-consuming form called Annotated Relations. We then present a {\em Theoretical Annotated Temporal Algebra} (TATA). Being explicit, TATA is convenient for specifying how the algebraic operations should behave, but is impractical to use because annotated relations are overwhelmingly large. Next, we present a Temporal Probabilistic Algebra (TPA). We show that our definition of the TP-Algebra provides a correct implementation of TATA despite the fact that it operates on implicit, succinct TP-relations instead of the overwhelmingly large annotated relations. Finally, we report on timings for an implementation of the TP-Algebra built on top of ODBC. (Also cross-referenced as UMIACS-TR-99-09) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Convergence of Ritz Values, Ritz Vectors, and Refined Ritz. Zhongxiao Jai. G. W. Stewart. January 1999.
This paper concerns the Rayleigh--Ritz method for computing an approximation to an eigenpair $(\lambda, x)$ of a non-Hermitian matrix $A$. Given a subspace $\clw$ that contains an approximation to $x$, this method returns an approximation $(\mu, \tilde x)$ to $(\lambda, x)$. We establish four convergence results that hold as the deviation $\epsilon$ of $x$ from $\clw$ approaches zero. First, the Ritz value $\mu$ converges to $\lambda$. Second, if the residual $A\tilde x-\mu\tilde x$ approaches zero, then the Ritz vector $\tilde x$ converges to $x$. Third, we give a condition on the eigenvalues of the Rayleigh quotient from which the Ritz pair is computed that insures convergence of the Ritz vector. Finally, we show that certain unconditionally. (Also cross-referenced as UMIACS-TR-99-08) University of Maryland Institute for Advanced Studies, Department of Computer Science, University of Maryland,
Guaranteeing Safety in the Presence of Moving Obstacles. Robert Kohout. January 1999.
Path planning is a fundamental problem in robotics research. Whether the robot is a manipulator arm in a factory floor, an unmanned all-terrain vehicle, a flying drone, or a household assistant serving coffee, the motions of the robot must be planned and executed in such a way that the robot can accomplish its goals. Motion planning must take into account the robot's inherent abilities to move and maneuver, its speed, and all of the various constraints imposed upon these abilities by the environment in which the robot is situated. Many real-world application domains are dynamic, in the sense that the plan-relevant parameters in the environment evolve over time. In such cases, motion planning must also take into account the time that it takes to plan. A perfect plan is useless if it cannot be produced in time to execute it in a changing world. This technical report focuses upon the problem of avoiding moving obstacles in a 2-dimensional environment. Specifically, it addresses the problem of guaranteeing that a robot will never be hit by an obstacle in the environment. It establishes conditions for guaranteeing that a safety-preserving path will always exist in the most commonly studied problem in moving obstacle avoidance, known as the Asteroids Avoidance Problem. These results are then extended to less restricted, more realistic variants of the problem, including the important case where the locations and trajectories are only made known to the planning algorithm at runtime. Once these conditions are established, they are used to develop an incremental algorithm that can solve the restricted Asteroids problem in low-order polynomial time. This algorithm takes its own observed worst-case running time into account, completes in a fraction of a second, and has been used to control Dodger, a simulated robot that avoids moving obstacles in hard real time. In over ten machine-weeks of testing, involving well over a million obstacles generated in a variety of ways, Dodger has not been hit by a single obstacle. (Also cross-referenced as UMIACS-TR 99-06) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Building Knowledge through Families of Software Studies: An Experience. Victor Basili. Forrest Shull. Filippo Lanubile. January 1999.
Experimentation in software engineering is difficult. One reason is that there are a large number of context variables, and so creating a cohesive understanding of experimental results requires a mechanism for motivating studies and integrating results. It requires a community of researchers that can replicate studies, vary context variables, and build abstract models that represent the common observations about the discipline. This paper discusses the experience of the authors, based upon a collection of experiments, in terms of a high level framework for organizing sets of related studies. With such a framework, experiments can be viewed as part of common families of studies, rather than being isolated events. Common families of studies can contribute to higher level hypotheses that no individual experiment could achieve. Then the replication of experiments within a family of studies can act as the cornerstone for building knowledge in an incremental manner. A mechanism is suggested that motivates, records, and integrates individual experiments within a family for analysis by the community at large. To support the framework, this paper discusses the experiences of the authors in carrying out empirical studies, with specific emphasis on persistent problems encountered in experimental design, threats to validity, criteria for evaluation, and execution of experiments in the domain of software engineering. (Also cross-referenced as UMIACS-TR-99-05) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
SHOP: Simple Hierarchical Ordered Planner. Dana Nau. Yue Cao. Amnon Lotem. Hector Munoz-Avia. January 1999.
SHOP (Simple Hierarchical Ordered Planner) is a domain-independent HTN Planning system with the following characteristics. * SHOP plans for tasks in the same order that they will later be executed. This avoids some of the goal-interaction issues that arise in other HTN planners, thus making the planning algorithm relatively simple. * The planning algorithm is sound and complete over a large class of problems. * Since SHOP knows the complete world-state at each step of the planning process, it can use highly expressive domain representations. For example, it can do planning problems that require complex numeric computations. * In our tests, SHOP solved problems several orders of magnitude faster than Blackbox and TLplan. This occured even though SHOP is written in Lisp and the other planners are written in C. (Also cross-referenced as UMIACS-TR 99-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Coding Discourse Structure in Dialogue (Version 1.0). Christine Nakatani. David Traum. March 1999.
This document is a manual for coding aspects of the discourse structure of dialogue. It was developed to serve as both as a starting point for discussion and a tool for coding exercises prior to the 3rd {\em Discourse Resource Initiative} (DRI) meeting, May 1998 in Chiba, Japan. The manual focuses on coding common ground units (CGUs) to get to a level of commonality between participants in dialogue, and then intentional and informational units (IUs) that represent the higher-level, hierarchical topic or purpose structure of dialogue. (Also cross-referenced as UMIACS-TR-99-03) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Estimating Available Capacity of an End-to-end Path in a Computer Network. Shikha Bahl. December 1998.
A common measure for characterizing the quality of a connection is in terms of average bandwidth, where averages reflect long-term behavior of the connection. It is well recognized, however, that the performance of a connection changes rapidly with time. In order to address the dynamic nature of the connection, a better measure is in terms of the capacity available to the user, treating the capacity as a time varying function. Techniques available for determining the capacity of a path require that a series of packets be sent at a rate that saturates the path for lengthy periods of time. In contrast, we present a non-stressful technique. Estimation of the available capacity requires the knowledge of the way a connection behaves. In order to reflect the actual operations of the network resources, deterministic models are presented, which make the estimation of the available capacity feasible. We adopt the approach of monitoring a connection while sending a controlled set of packets according to the probe packet train model and measuring the time it takes for the probe packets to go across a connection. Based on these measurements, and the deterministic models, we estimate the available capacity of a connection during the observation period. The applicability of the techniques developed for estimating the available capacity were tested through experimental studies using NetDyn for measurements, and selected Internet sites as end points of the connections. The results of the experimental study are also presented. (Also cross-referenced as UMIACS-TR-99-02) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A study of Cyclone technology. Sung Lee. January 1999.
Since their advent, computer networks have used event-based mechanisms for managing resources. While technological advances have resulted in computer networking becoming ubiquitous, the performance of the networks suffers from these approaches of resource management. Cyclone technology, on the other hand, manages resources in a time-based manner, resulting in a networking technology which can deliver loss free, contention free, and jitter free data in a very efficient manner. In Cyclone, scheduled traffic reserves the use of resources in time and space at the time of establishing the connection. As a consequence, there are no losses, jitters, or contentions for any resources. This technology also supports on-demand traffic, for which available resources are allocated on-demand without affecting the performance of scheduled traffic and leading to higher resource utilization. The scheduling approach used indicates that the links can sustain very high loading without having any adverse impact on performance of the scheduled traffic. Clearly the time coordination among resources is the key in achieving jitter free and loss free computer communication with minimum end-to-end delay. Cyclone technology exploits such coordinations of resources in time and space and requires minimal processing at a node during data transfer. It eliminates the need for carrying header information allowing more efficient utilization of existing communication bandwidth. The problems of congestion and loss are removed through end-to-end time coordination among network components, thus leading to fewer control messages. For traffic with stringent timing requirements such as real-time audio and video, Cyclone technology offers well-suited network environments in which the end-to-end delay and jitter can be controlled and guaranteed. In this disseration, we present end-to-end design aspects and the feasibility of Cyclone technology. A design is presented for all aspects including components and scheduling, and the modes of operations in a Cyclone network have been considered. Our study on the behavior of the current scheduling technique shows that the connection acceptance probability is very high, link utilization can be close to 100%, and the worst case delays due to scheduling is rather low. (Also cross-referenced as UMIACS-TR-99-01) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
High Performance Computing Algorithms for Land mCover Dynamics Using. Satya Kalluri. Joseph Ja'Ja'. David A. Bader. Zengyan Zhang. John Townshend. Hassan Fallah-adl. December 1998.
Global and regional land cover studies require the ability to apply complex models on selected subsets of large amounts of multi-sensor and multi-temporal data sets that have been derived from raw instrument measurements using widely accepted pre-processing algorithms. The computational and storage requirements of most such studies far exceed what is possible on a single workstation environment. We have been pursuing a new approach that couples scalable and open distributed heterogeneous hardware with the development of high performance software for processing, indexing, and organizing remotely sensed data. Hierarchical data management tools are used to ingest raw data, create metadata, and organize the archived data so as to automatically achieve computational load balancing among the available nodes and minimize I/O overheads. We illustrate our approach with four specific examples. The first is the development of the first fast operational scheme for the atmospheric correction of Landsat TM scenes, while the second example focuses on image segmentation using a novel hierarchical connected components algorithm. Retrieval of global BRDF (Bidirectional Reflectance Distribution Function) in the red and near infrared wavelengths using four years (1983 to 1986) of Pathfinder AVHRR Land (PAL) data set is the focus of our third example. The fourth example is the development of a hierarchical data organization scheme that allows on-demand processing and retrieval of regional and global AVHRR data sets. Our results show that substantial improvements in computational times can be achieved by using the high performance computing technology. (Also cross-referenced as UMIACS-TR-98-18) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Evaluation of Client-Server Architectures. Michael D. Beynon. Renato Ferreira. Asmara Afework. Ganti Krishna Mohan. December 1998.
No abstract available. Also cross-referenced as UMIACS-TR-98-17 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hybrid Probabilistic Programs: Algorithms and Complexity. Michael Dekhtyar. Alex Dekhtyar. V.S. Subrahmanian. December 1998.
Hybrid Probabilistic Programs (HPPs) are logic programs that allow the programmer to explicitly encode his knowledge of the dependencies between events being described in the program. In this paper, we classify HPPs into three classes called HPP_1,HPP_2 and HPP_r for r >= 3. For these classes, we provide three types of results for HPPs. First, we develop algorithms to compute the set of all ground consequences of an HPP. Then we provide algorithms and complexity results for the problems of entailment (``Given an HPP P and a query Q as input, is Q a logical consequence of P?'') and consistency (``Given an HPP P as input, is P consistent?''). Our results provide a fine characterization of when polynomial algorithms exist for the above problems, and when these problems become intractable. (Also cross-referenced as UMIACS-TR-98-76) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Single Display Groupware: A Model for Co-present Collaboration. Jason Stewart. Benjamin B. Bederson. Allison Druin. December 1998.
We introduce a model for supporting collaborative work between people that are physically close to each other. We call this model Single Display Groupware (SDG). In this paper, we describe this model, comparing it to more traditional remote collaboration. We describe the requirements that SDG places on computer technology, and our understanding of the benefits and costs of SDG systems. Finally, we describe a prototype SDG system that we built and the results of a usability test we ran with 60 elementary school children. (Also cross-referenced as UMIACS-TR-98-75) University of Maryland Instsitute for Advacned Computer Studies, Department of Computer Science, University of Maryland,
Does a Sketchy Appearance Influence Drawing Behavior?. Jon Meyer. Benjamin B. Bederson. December 1998.
In this paper we examine the role of visual aesthetics in how people interact with computers. Specifically, we are interested in whether simply adopting a sketch-like visual appearance in a drawing application encourages users to interact with the application more freely or rapidly than they would if they were using the standard, precise, rectilinear appearance that most drawing applications now supply. We carried out two user studies. In the first study, we asked members of the University of Maryland Art History department to draw a series of diagrams using two different line styles. In the second experiment, we used the World Wide Web to collect drawing diagrams from a much broader set of participants. Both studies reveal that subjects draw more quickly using the sketch-like ('wavy') line style than the straight line style. (Also cross-referenced as UMIACS-TR-98-74) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Does Animation Help Users Build Mental Maps of Spatial Information?. Benjamin B. Bederson. Angela Boltman. December 1998.
We examine how animating a viewpoint change in a spatial information system affects a user's ability to build a mental map of the information in the space. We found that animation improves users' ability to reconstruct the information space, with no penalty on task performance time. We believe that this study provides strong evidence for adding animated transitions in many applications with fixed spatial data where the user navigates around the data space. (Also cross-referenced as UMIACS-TR-98-73) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Survey of Current Paradigms in Machine Translation. Bonnie J. Dorr. Pamela W. Jordan. John W. Benoit. December 1998.
This is paper is a survey of the current machine translation research in the US, Europe, and Japan. A short history of machine translation is presented first, followed by an overview of the current research work. Representative examples of a wide range of different approaches adopted by machine translation researchers are presented. These are described in detail along with a discussion of the practicalities of scaling up these approaches for operational environments. In support of this discussion, issues in, and techniques for, evaluating machine translation systems are discussed. Also cross-referenced as UMIACS-TR-98-72) University of Maryland Institute for Advanced Computer Science, Department of Computer Science, University of Maryland,
Caching and Scheduling for Broadcast Disk Systems. Vincenzo Liberatore. December 1998.
Unicast connections lead to performance and scalability problems when a large client population attemps to access the same data. Broadcast push and broadcast disk technology address the problem by broadcasting data items from a server to a large number of clients. Broadcast disk performance depends mainly on caching strategies at the client site and on how the broadcast is scheduled at the server site. An on-line broadcast disk paging strategy makes caching decisions without knowing access probabilities. In this paper, we subject on-line paging algorithms to extensive empirical investigation. The Gray algorithm [KL98] always outperformed other on-line strategies on both synthetic and Web traces. Moreover, caching limited the skewness needed from a broadcast schedule, and led to favor efficient caching algorithms over refined scheduling strategies when the cache was not small. Prior to this paper, no work had empirically investigated on-line paging algorithm and their relation with server scheduling. (Also cross-referenced as UMIACS-TR-98-71) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Evaluation of Architectural Alternatives for Rapidly Growing. Mustafa Uysal. Anurag Acharya. Joel Saltz. November 1998.
Growth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Evaluation of Architectural Alternatives for Rapidly Growing. Mustafa Uysal. Anurag Acharya. Joel Saltz. November 1998.
Growth and usage trends for several large datasets indicate that there is a need for architectures that scale the processing power as the dataset increases. In this paper, we evaluate three architectural alternatives for rapidly growing and frequently reprocessed datasets: active disks, clusters, and shared memory multiprocessors (SMPs). The focus of this evaluation is to identify potential bottlenecks in each of the alternative architectures and to determine the performance of these architectures for the applications of interest. We evaluate these architectural alternatives using a detailed simulator and a suite of nine applications. Our results indicate that for most of these applications Active Disk and cluster configurations were able to achieve significantly better performance than SMP configurations. Active Disk configurations were able to match (and in some cases improve upon) the performance of commodity cluster configurations. (Also cross-referenced as UMIACS-TR-98-68) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
MOCHA: A Self-Extensible Middleware Substrate for Distributed Data. Manuel Rodriguez-Martinez. Nick Roussopoulos. November 18, 1998.
This paper describes MOCHA, a self-extensible middleware substrate designed to interconnect data sources distributed over a computer network. MOCHA is designed to scale to large environments and is based on the idea that the functionality in the system should be deployed by the middleware itself. This is realized by shipping the code implementing either advanced data types or tailored query operators to remote data sources and have it executed remotely. Optimized query plans push the evaluation of powerful data-reducing operators to the data sites while executing data-inflating operators at the client's site. The Volume Reduction Factor is a new cost metric introduced to select the best site to execute query operators and is shown to be more accurate than the standard selectivity factor. MOCHA has been implemented in Java and runs on top of the Informix Universal Server. In this paper we present the architecture of MOCHA, the ideas behind it, and a performance study using data and queries from the Sequoia 2000 Benchmark. The results of this study demonstrate that MOCHA not only provides a flexible and scalable framework but also substantially improves query performance in contrast to traditional middleware solutions. (Also cross-referenced as UMIACS-TR-98-67) University of Maryland Institute for Advanced Computer Studies), Department of Computer Science, University of Maryland,
A Performance Evaluation of Online Warehouse Update Algorithms. Alexandros Labrinidis. Nick Roussopoulos. November 1998.
Data warehouse maintenance algorithms usually work off-line, making the warehouse unavailable to users. However, since most organizations require continuous operation, we need be able to perform the updates online, concurrently with user queries. To guarantee that user queries access a consistent view of the warehouse, online update algorithms introduce redundancy in order to store multiple versions of the data objects that are being changed. In this paper, we present an online warehouse update algorithm, that stores multiple versions of data as separate rows (vertical redundancy). We compare our algorithm to another online algorithm that stores multiple versions within each tuple by extending the table schema (horizontal redundancy). We have implemented both algorithms on top of an Informix Dynamic Server and measured their performance under varying workloads, focusing on their impact on query response times. Our experiments show that, except for a limited number of cases, vertical redundancy is a better choice, with respect to storage, implementation overhead, and query performance. (Also cross-referenced as UMIACS TR-98-66) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Study of Permutations Permissible by LIFO Service Disciplines. Simon Hawkin. Ashok Agrawala. November 1998.
We study permutations of the job order performed by various LIFO service disciplines. The sets of such permutations are shown to be equivalent to sets of string permutations with simple characteristics. In particular, it is easy to test whether a given permutation belongs to these sets. Several algorithms that efficiently perform such tests are presented. (Also cross-referenced as UMIACS-TR-98-65) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Learning Response Time for WebSources using Query Feedback and. Jean-Robert Gruser. Louiqa Raschid. Vladimir Zadorozhny. November 1998.
The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current optimization technology for wrapper mediator architectures needs to be extended to estimate the response time (delays) to access WebSources and to use this delay in query optimization. In this paper, we present a Multi-Dimensional Table (MDT), a tool that is based on learning using query feedback from WebSources. We describe the MDT learning algorithms, and report on the MDT learning for WebSources. The MDT uses dimensions Time of day, Day, and Quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay), and a confidence in this prediction, for some query. Experiment data was collected from several WebSources and analyzed, to determine those dimensions that were significant in estimating the response time for particular WebSources. Our research shows that we can improve the quality of learning by tuning the MDT features, e.g., including significant dimensions in the MDT, or changing the ordering of dimensions. We then demonstrate how the MDT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the MDT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where MDT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and MDT learning, we test how good the MDT is in minimizing these penalties. Also cross-referenced as UMIACS TR #98-64 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Performance Impact of Proxies in Data Intensive Client-Server Parallel. Michael D. Beynon. Alan Sussman. Joel Saltz. November 1998.
Large client-server data intensive applications can place high demands on system and network resources. This is especially true when the connection between the client and server spans a wide-area internet link. In this paper, we consider changing the typical client-server architecture of a class of data intensive applications. We show that given sufficient common interest among multiple clients, our enhancements reduce the response time per-client and reduce the amount of data sent across the wide-area link. In addition, we also see a reduction in server utilization which helps to improve server scalability as more clients are added to the system. (Also cross-referenced as UMIACS-TR-98-70) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Translating IDEF3 to PSL. Mihai Ciocoiu. November 1998.
This document describes the process of integrating IDEF3 and PSL. The EPIF like frame representation developed for representing IDEF3 schematics is introduced, together with the compilation rules for the various IDEF3 elements. The appendix contains a full example of the use of the translator for the Camile scenario. Also cross-referenced as a UMIACS-TR-98-63 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Three Results on Iterative Regularization. Misha Kilmer. G. W. Stewart. October 1998.
In this paper we present three theorems which give insight into the regularizing properties of {\minres}. While our theory does not completely characterize the regularizing behavior of the algorithm, it provides a partial explanation of the observed behavior of the method. Unlike traditional attempts to explain the regularizing properties of Krylov subspace methods, our approach focuses on convergence properties of the residual rather than on convergence analysis of the harmonic Ritz values. The import of our analysis is illustrated by two examples. In particular, our theoretical and numerical results support the following important observation: in some circumstances the dimension of the optimal Krylov subspace can be much smaller than the number of the components of the truncated spectral solution that must be computed to attain comparable accuracy. Also cross-referenced as UMIACS-TR-98-62 University of Maryland Institute for Advanced Computer Studies, De[artment of Computer Science, University of Maryland,
A Performance Study of Dynamic Replication Techniques in Continuous. ChengFu Chou. Leana Golubchik. John C.S. Lui. October 1998.
Multimedia applications are emerging in education, information dissemination, entertainment, as well as many other applications. The stringent requirements of such applications make design of cost-effective and scalable systems difficult, and therefore efficient adaptive and dynamic resource management techniques can be of great help in improving resource utilization and consequently improving performance and scalability of such systems. In this paper, we focus on threshold-based policies, for dynamic resource management, and specifically, in the context of continuous media (CM) servers. Furthermore, we propose a mathematical model of user behavior and show, through a performance study, that not only does the use of this model in conjunction with dynamic resource management policies improves the system's performance but that it also facilitates significantly reduced sensitivity to changes in: (a) system architecture, (b) workload characteristics, (c) skewness of data access patterns, (d) frequency of changes in data access patterns, and (e) choice of threshold values. We believe that not only is this a desirable property for a CM server, in general, but that furthermore, it suggests the usefulness of these techniques across a wide range of continuous media applications. Also cross-referenced as UMIACS-TR-98-61 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Understanding Transportation Management Systems Performance with a. Catherine Plaisant. Phil Tarnoff. Aditya Saraf. Anne Rose. November 1998.
We have developed a simulation-based learning environment to provide system designers and operators with an appreciation of the impact of incidents on traffic delay. We used an application framework developed at the University of Maryland for constructing simulation-based learning environments called SimPLE (Simulated Processes in a Learning Environment). Environments developed with SimPLE use dynamic simulations and visualizations to represent realistic time-dependent behavior and are coupled with guidance material and other software aids that facilitate learning. The simulation allows learners to close freeway lanes and divert traffic to an arterial road. Users can see the effect of the detour on freeway and arterial delay. Users can then adjust signal timing interactively on a time space diagram and watch the effect of their adjustment on green band changes and on arterial delays and total delays. Department of Computer Science, University of Maryland,
Excentric Labeling: Dynamic Neighborhood Labeling for Data. Jean-Daniel Fekete. Catherine Plaisant. October 1998.
The widespread use of information visualization is hampered by the lack of effective labeling techniques. A taxonomy of labeling methods is proposed. We then describe "excentric labeling", a new dynamic technique to label a neighborhood of objects located around the cursor. This technique does not intrude into the existing interaction, it is not computationally intensive, and was easily applied to several visualization applications. A pilot study indicates a strong speed benefit for tasks that involve the rapid exploration of large numbers of objects. Also cross-referenced as UMIACS-TR-98-59 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Iterative Methods for Stabilized DiscreteConvection--Diffusion. Yin-Tzer Shih. Howard C. Elman. October 1998.
In this paper, we study the computational cost of solving the convection-diffusion equation using various discretization strategies and iteration solution algorithms. The choice of discretization influences the properties of the discrete solution and also the choice of solution algorithm. The discretizations considered here are stabilized low order finite element schemes using streamline diffusion, crosswind diffusion and shock--capturing. The latter, shock--capturing discretizations lead to nonlinear algebraic systems and require nonlinear algorithms. We compare various preconditioned Krylov subspace methods including Newton--Krylov methods for nonlinear problems, as well as several preconditioners based on relaxation and incomplete factorization. We find that although enhanced stabilization based on shock--capturing requires fewer degrees of freedom than linear stabilizations to achieve comparable accuracy, the nonlinear algebraic systems are more costly to solve than those derived from a judicious combination of streamline diffusion and crosswind diffusion. Solution algorithms based on GMRES with incomplete block--matrix factorization preconditioning are robust and efficient. (Also cross-referenced as UMIACS-TR-98-58) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Automated Techniques for Designing Embedded Signal Processors on. Dong-In Kang. Richard Gerber. Leana Golubchik. October 1998.
In this paper, we present a performance-based technique to help synthesize high-bandwidth radar processors on commodity platforms. This problem is innately complex, for a number of reasons. Contemporary radars are very compute-intensive: they have high pulse rates, and they sample a large amount of range readings at each pulse. Indeed, modern radar processors can require CPU loads of in high-gigaflop to tera-flop ranges, performance which is only achieved by exploiting the radar's inherent data parallelism. Next-generation radars are slated to operate on scalable clusters of commodity systems. Throughput is only one problem. Since radars are usually embedded within larger real-time applications, they also must adhere to latency (or deadline) constraints. Building an embedded radar processor on a network of workstations (or a NOW) involves partitioning load in a balanced fashion, accounting for stochastic effects injected on all software-based systems, synthesizing runtime parameters for the on-line schedulers and drivers, and meeting the latency and throughput constraints. In this paper, we show how performance analysis can be used as an effective tool in the design loop; specifically, our method uses analytic approximation techniques to help synthesize efficient designs for radar processing systems. In our method, the signal-processor's topology is represented via a simple flow-graph abstraction, and the per-unit load requirements are modeled stochastically, to account for second-order effects like cache memory behavior, DMA interference, pipeline stalls, etc. Our design algorithm accepts the following inputs: (a)~the system topology, including the thread-to-CPU mapping, where multi-threading is assumed to be used; (b) the per-task load models; and (c) the required pulse rate and latency constraints. As output, it produces the proportion of load to allocate to each task, set at manageable time resolutions for the local schedulers; an optimal service interval over which all load proportions should be guaranteed; an optimal sampling frequency; and some reconfiguration schemes to accommodate single-node failures. Internally, the design algorithms use analytic approximations to quickly estimate output rates and propagation delays for candidate solutions. When the system is synthesized, its results are checked via a simulation model, which removes many of the analytic approximations. We show how our system synthesizes a real-time synthetic aperture radar, under a variety of loading conditions. Also cross-referenced as UMIACS TR # 98-57 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
LifeLines: Using Visualization to Enhance Navigation and Analysis of. Catherine Plaisant. Richard Mushlin. Aaron Snyder. Jia Li. Dan Heller. Ben Shneiderman. October 1998.
LifeLines provide a general visualization environment for personal histories. We explore its use for clinical patient records. A Java user interface is described, which presents a one-screen overview of a computerized patient record using timelines. Problems, diagnoses, test results or medications can be represented as dots or horizontal lines. Zooming provides more details; line color and thickness illustrate relationships or significance. The visual display acts as a giant menu, giving direct access to the data. (Also cross-referenced as UMIACS-TR-98-56) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Decentralized Replication Mechanisms in Deno. Peter J. Keleher. October 1998.
We are currently finalizing the design of Deno, a new shared-object system intended for use with repli-cated mobile and wide-area data. The broad aim of our research is to develop a framework for highly-available, decentralized shared-object protocols. The key idea is that our protocols will support high availability through a distributed voting scheme. Specifically, we will investigate (a) peer-to-peer up-dates, which will allow incremental progress to be made in the absence of full connectivity between com-ponent servers, (b) voting rather than centralized schemes for committing updates, ensuring that no sin-gle point of failure can prevent updates from being committed, and (c) application-specific consistency control, allowing applications to relax coherency constraints in ways that do not break the application's notion of consistency. Distribution and multiple connectivity modes are becoming the norm rather than the exception in current computing environments. Thus, we expect the impact of our research to be felt in areas as disparate as mobile computing and collaborative data warehousing on the Internet. (Also cross-referenced as UMIACS-TR-98-54) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Minimalist Theory of Human Sentence Processing. Amy Weinberg. October 1998.
Research in the theory of human sentence processing can be characterized by 3 styles of explanation. Researchers taking the first track have tried to motivate principles of structural preference from extralinguistic considerations like storage capacity in working memory, or bounds on complexity of incremental analysis. Frazier and Rayner's (1982) Minimal Attachment and Right Association principles, and Gorrell's simplicity metric, are examples of this type of theory. The second track eschews "parsing st rategies", replacing them with a fairly complex tuning by speaker/hearers to frequency in the hearer's linguistic environment. The difficulty of recovering an analysis of a construction in a particular case is a function of how often similar structures o r thematic role arrays appear in the language as a whole. The work of Trueswell et al (1994), Jurafsky (1996) and MacDonald et al (1994) are examples of frequency or probability based constraint satisfaction theories. The third track takes a more represe ntational view and ties processing principles to independently needed restrictions derived from competence and language learning. This approach claims that the natural language faculty is extremely well designed in the sense that the same set of principl es that govern language learning also contribute to a theory of sentence processing. This track is represented by the work of Gibson (1981), Gorrell (1995) Pritchett (1992), Philips (1995, 1996) and Weinberg (1992), who argue that processing can be seen as the rapid incremental satisfaction of grammatical constraints such as the Theta Criterion, which are needed independently to explain language learning or language variation. A variant of this approach, represented by Crain and Steedman (1985) among ot hers constrains the grammatical source for parsing principles but locates these principles within a discourse or semantic, rather than a syntactic component. This paper proposes a model of the last type. We argue that a particular version of the Minimalist Program (Chomsky (1993), Uriagereka (1998)) provides principles needed to explain both initial human preferences for ambiguous structures and provides a t heory of reanalysis, explaining when initial preferences can be revised given subsequent disconfirming data, and when they lead to unrevisable garden paths. We compare our model to other linguistically motivated theories such as Philips (1995, 1996), ar guing that Minimalist principles subsume the generalizations captured by Philip's theory in a more empirically adequate way. Finally, we argue that the data presented argue for this theory over those motivated by extralinguistic principles. Also cross-referenced as UMIACS-TR-98-53 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Linguist and the Laundromat. Amy Weinberg. October 1998.
This paper resulted from a roundtable discussion at the 1998 CUNY Sentence Processing Conference held at Rutgers University. Jerry Fodor (Philosphy, Rutgers University) an argued there that an adequate lexical semantics had to invoke a criterion of Rever se Compositionality. Fodor gives the following definition of 'Reverse Compositionality'(RC): "Nothing belongs to the lexical entry for a lexical item except what that item contributes to the grammatical representation of its hosts" where 'host is defin ed as "any expression E ...of which E is a constituent. " Moreover, Fodor claims that invoking this criterion has broad consequences for theories of language processing and acquisition, particularly with respect to theories that attribute processing beha vior to "lexical effects. Fodor claims that "...most of what cognitive science blithely refers to as lexical effects in parsing and language learning aren't in fact mediated by information of the kind that lexical entries contain...." and "... that language acquisition delivers sh allow lexical entries consonant with reverse compositionality, and that parsing delivers correspondingly shallow lexical entries consonant with assigning tokens to their types, and that everything else will turn out to be 'performance theory' ... In this paper, I argue that frequency and other standard lexical processing effects can form a legitimate part of a theory of sentence processing even if it adopts the criterion of "reverse compositionaliy". Cases drawn from the literature are used to s ketch what a theory adopting Fodor's criterion and using frequency and/or probabalistic information would look like. This commentary will appear in Proceedings of CUNY Conference on Sentence Processing, 1998, S. Stevenson and P. Merlo, eds, J. Benjami ns.. Also cross-referenced as UMIACS-TR-98-52 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Choosing Regularization Parameters in Iterative Methods for Ill-Posed. Misha E. Kilmer. Dianne P. O'Leary. October 1998.
Numerical solution of ill-posed problems is often accomplished by discretization (projection onto a finite dimensional subspace) followed by regularization. If the discrete problem has high dimension, though, typically we compute an approximate solution by projection onto an even smaller dimensional space, via iterative methods based on Krylov subspaces. In this work we present efficient algorithms that regularize after this second projection rather than before it. We prove some results on the approximate equivalence of this approach to other forms of regularization and we present numerical examples. (Also cross-referenced as UMIACS-TR-98-48) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Document Image Understanding - 1997. David Doermann. October 1998.
This report contains nearly 500 references which are directly related to the field of document image understanding and appeared in major journals and conferences during 1997. Each reference is classified by major topic. Areas covered include, but are not limited to, preprocessing, models and representations, on-line recognition, off-line recognition, graphics recognition and interpretation, page processing, post-processing and special applications. Department of Computer Science, University of Maryland,
Translating English and Mandarin Verbs with Argument Structure. Mari Broman Olsen. October 1998.
This paper applies and evaluates a semi-automatically acquired Mandarin Chinese lexicon (Olsen, Dorr, and Thomas 1998) with respect to translation of English and Chinese verbs in a UNESCO text (Otero 1997). I demonstrate how Lexical Conceptual Structure templates allow the same semantic structure to apply both to verbs with thematic roles incorporated in the verb itself, and those requiring external thematic complements. Using as examples the English verb _provide_, the Chinese counterpart ti2 gong2 (STC 2251 0180) and its English counterparts in the text, I show how potential translations are included or eliminated automatically based on their thematic role structure. The example illustrates (i) how an interlingual thematic representation based in large part on English argument structure may be adapted felicitously to a historically unrelated language, and (ii) how an interlingual (IL) resource developed for analysis may also be used in generation. (Also cross-refernced as UMIACS-TR-98-51) (Also cross-referenced as LAMP-TR-023) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Thematic Hierarchy for Efficient Generation from Lexical-Conceptual. Bonnie J. Dorr. Nizar Habash. David Traum. October 1998.
This paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an off-the-shelf generator (Nitrogen). There are two contributions of this work: (1) the development of a thematic hierarchy that provides ordering information for realization of arguments in their surface positions; (2) the provision of a diagnostic tool for detecting inconsistencies in an existing online LCS-based lexicon that allows us to enhance principles for thematic-role assignment. (Also cross-referenced as UMIACS-TR-98-50) (Also cross-refernced as LAMP-TR-022) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Lexical Selection for Cross-Language Applications: Combining LCS with. Bonnie J. Dorr. Maria Katsova. October 1998.
This paper describes experiments for testing the power of large-scale resources for lexical selection in machine translation (MT) and cross-language information retrieval (CLIR). We adopt the view that verbs with similar argument structure share certain meaning components, but that those meaning components are more relevant to argument realization than to idiosyncratic verb meaning. We verify this by demonstrating that verbs with similar argument structure as encoded in Lexical Conceptual Structure (LCS) are rarely synonymous in WordNet. We then use the results of this work to guide our implementation of an algorithm for cross-language selection of lexical items, exploiting the strengths of each resource: LCS for semantic structure and WordNet for semantic content. We use the Parka Knowledge-Based System to encode LCS representations and WordNet synonym sets and we implement our lexical-selection algorithm as Parka-based queries into a knowledge base containing both information types. (Also cross-referenced as UMIACS-TR-98-49) (Also cross-referenced as LAMP-TR-021) University of Maryland Institute for Advanced Computer Studies, Department of Computer, University of Maryland,
The Full Degree Spanning Tree Problem. Randeep Bhatia. Samir Khuller. Robert Pless. Yoram Sussmann. October 1998.
The full degree spanning tree problem is defined as follows: given a connected graph $G=(V,E)$ find a spanning tree $T$ so as to maximize the number of vertices whose degree in $T$ is the same as in $G$ (these are called vertices of ``full'' degree). We show that this problem is NP-hard. We also present almost {\em optimal} approximation algorithms for it assuming $coR \neq NP$. For the case of general graphs our approximation factor is $\Theta(\sqrt{n})$. Using H{\aa}stad's result on the hardness of approximating clique, we can show that if there is a polynomial time approximation algorithm for our problem with a factor of $O(n^{\frac{1}{2}-\epsilon})$ then $coR=NP$. For the case of planar graphs, we present a polynomial time approximation scheme. Additionally, we present some experimental results comparing our algorithm to the previous heuristic used for this problem. (Also cross-referenced as UMIACS 98-47) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Deferred Data-Flow Analysis : Algorithms, Proofs and Applications. Shamik D. Sharma. Anurag Acharya. Joel Saltz. September 1998.
Loss of precision due to the conservative nature of compile-time dataflow analysis is a general problem and impacts a wide variety of optimizations. We propose a limited form of runtime dataflow analysis, called deferred dataflow analysis (DDFA), which attempts to sharpen dataflow results by using control-flow information that is available at runtime. The overheads of runtime analysis are minimized by performing the bulk of the analysis at compile-time and deferring only a summarized version of the dataflow problem to runtime. Caching and reusing of dataflow results reduces these overheads further. DDFA is an interprocedural framework and can handle arbitrary control structures including multi-way forks, recursion, separately compiled functions and higher-order functions. It is primarily targeted towards optimization of heavy-weight operations such as communication calls, where one can expect significant benefits from sharper dataflow analysis. We outline how DDFA can be used to optimize different kinds of heavy-weight operations such as bulk-prefetching on distributed systems and dynamic linking in mobile programs. We prove that DDFA is safe and that it yields better dataflow information than strictly compile-time dataflow analysis. (Also cross-referenced as UMIACS-TR-98-46) Unoversity of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Comparison of the Memory Management sub-systems in FreeBSD and Linux. Rohit Dube. September 1998.
In this article we seek to compare the memory management sub-systems of two popular and freely available operating systems - FreeBSD and Linux. First a framework is developed, spelling out the components of a generic and modern memory management system. The framework is then used in a design level comparison of memory management in the two operating systems. (Also cross-referenced as UMIACS-TR-98-45) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Designing Practical Efficient Algorithms for Symmetric Multiprocessors. David R. Helman. Joseph JaJa. October 1998.
Symmetric multiprocessors (SMPs) dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing efficient algorithms for symmetric multiprocessors. We then use this model to create efficient solutions to two widely different types of problems - linked list prefix computations and generalized sorting. Our novel algorithm for prefix computations builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Our algorithm for generalized sorting is a modification of our algorithm for sorting by regular sampling on distributed memory architectures. The algorithm is a stable sort which appears to be asymptotically faster than any of the published algorithms for SMPs. Both of our algorithms were implemented in C using POSIX threads and run on three symmetric multiprocessors - the DEC AlphaServer, the Silicon Graphics Power Challenge, and the HP-Convex Exemplar. We ran our code for each algorithm using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. In spite of the fact that the processors must compete for access to main memory, both algorithms still yielded scalable performance up to 16 processors, which was the largest platform available to us. For some problems, our prefix computation algorithm actually matched or exceeded the performance of the best sequential solution using only a single thread. Similarly, our generalized sorting algorithm always beat the performance of sequential merge sort by at least an order of magnitude, even with a single thread. (Also cross-referenced as UMIACS-TR-98-44) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Meta Agent Programs. Juergen Dix. V.S.Subrahmanian. George Pick. September 1998.
There are numerous applications where one agent a needs to reason about the beliefs of another agent, as well as about the actions that other agents may take. Eiter et. al. introduced the concept of an agent program, and provided a language within which the operating principles of an agent could be declaratively encoded on top of imperative data structures. We first introduce certain belief data structures that an agent needs to maintain. Then we introduce the concept of a "Meta Agent Program" (MAP), that extends the Eiter et. al. framework, so as to allow agents to peform metareasoning. We build a formal semantics for MAPs, and show how this semantics supports not just beliefs agent a may have about agent b's state, but also beliefs about agents b's beliefs about agent c's actions, beliefs about b's beliefs about agent c's state, and so on. Finally, we provide a translation that takes any MAP as input and converts it into an agent program such that there is a one-one correspondence between the semantics of the MAP and the semantics of the resulting agent program. This correspondence allows an implementation of MAPs to be built on top of an implementation of agent programs. (Also cross-referenced as UMIACS-TR-98-43) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Analysis of a Packet-Pair Scheme for Estimating Bottleneck Bandwidth in a Network. Shikha Bahl. September 1998.
In order to assess the performance of a connection it is important to determine the bandwidth offered by the slowest node along the path, also known as bottleneck bandwidth. In past, many reasearchers have used a packet-pair technique in order to estimate the bottleneck bandwidth. In real networks, however, the measurements made using the packet-pair technique do not always reflect the correct estimate of the bottleneck service time due to the presence of cross traffic. While several reasons for the observed variability have been reported, the exact nature of the impact of cross traffic on the observations has not been studied In this paper we present a model to explain how the measured difference of the reception time for a packet-pair can be related to the characteristic of the service time and cross traffic that the pair found along the path. (Also cross-referenced as UMIACS-TR-98-42) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Parallel Strands: A Preliminary Investigation into Mining the Web for. Philip Resnik. August 1998.
Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention. (Also cross-referenced as UMIACS-TR-98-41) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Looking to Parallel Algorithms for ILP and Decentralization. Efraim Berkovich. Bruce L. Jacob. Joseph Nuzman. Uzi Vishkin. july 20, 1998.
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained SPMD-style programming; a SPMD program can translate directly to MIPS assembly language using three additional instruction primitives. The motivation for XMT is: (i) to define an inherently decentralizable architecture, taking into account that the performance of future integrated circuits will be dominated by wire costs, (ii) to increase available instruction-level parallelism (ILP) by leveraging expertise in the world of parallel algorithms, and (iii) to reduce hardware complexity by alleviating the need to detect ILP at run-time: if parallel algorithms can give us an overabundance of work to do in the form of thread-level parallelism, one can extract instruction-level parallelism with greatly simplified dependence-checking. We show that implementations of such an architecture tend towards decentralization and that, when global communication is necessary, overall performance is relatively insensitive to large on-chip delays. We compare the performance of the design to more traditional parallel architectures and to a high-performance superscalar implementation, but the intent is merely to illustrate the performance behavior of the organization and to stimulate debate on the viability of introducing SPMD to the single-chip processor domain. We cannot offer at this stage hard comparisons with well-researched models of execution. When programming for the SPMD model, the total number of operations that the processor has to perform is often slightly higher. To counter this, we have observed that the length of the critical path through the dynamic execution graph is smaller than in the serial domain, and the amount of ILP is correspondingly larger. Fine-grained SPMD programming connects with a broad knowledge base in parallel algorithms and scales down to provide good performance relative to high-performance superscalar designs even with small input sizes and small numbers of functional units. Keywords: Fine-grained SPMD, parallel algorithms. spawn-join, prefix-sum, instruction-level parallelism, decentralized architecture. (Also cross-referenced as UMIACS-TR- 98-40) University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Performance Prediction Framework for Data Intensive Applications on. Mustafa Uysal. Tahsin M. Kurc. Alan Sussman. Joel Saltz. July 1998.
This paper presents a simulation-based performance prediction framework for large scale data-intensive applications on large scale machines. Our framework consists of two components: application emulators and a suite of simulators. Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering, processing structure, etc.) easily and flexibly. Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine configurations. The key to efficient simulation of very large scale configurations is a technique called loosely-coupled simulation where the processing structure of the application is embedded in the simulator, while preserving data dependencies and data distributions. We evaluate our performance prediction tool using a set of three data-intensive applications. (Also cross-referenced as UMIACS TR # 98-39) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Prefix Computations on Symmetric Multiprocessors. David R. Helman. Joseph JaJa. July 1998.
We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas Reid-Miller and Blelloch targeted their algorithm for implementation on a vector multiprocessor architecture, we develop our algorithm for implementation on the symmetric multiprocessor architecture (SMP). These symmetric multiprocessors dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Our prefix computation algorithm was implemented in C using POSIX threads and run on three symmetric multiprocessors - the DEC AlphaServer, the SGI Power Challenge, and the HP-Convex Exemplar. We ran our code using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. For some problems, our algorithm actually matched or exceeded the optimal sequential solution using only a single thread. Moreover, in spite of the fact that the processors must compete for access to main memory, our algorithm still resulted in scalable performance up to 16 processors, which was the largest platform available to us. (Also cross-referenced as UMIACS-98-38) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Characterization of the General Protocol Conformance Test Sequence. Raymond Miller. Junehwa Song. June 1998.
No abstract submitted. Department of Computer Science, University of Maryland,
Mobile Streams. M. Ranganathan. Anurag Acharya. Laurent Andrey. Virginie Schaal. Joel Saltz. June 1998.
A large class of distributed testing, control and collaborative applications are reactive or event driven in nature. Such applications can be structured as a set of handlers that react to events and that in turn can trigger other events. We have developed an application building toolkit that facilitates development of such applications. Our system is based on the concept of Mobile Streams. Applications developed in our system are dynamically extensible and re-configurable and our system provides the application designer a means to control how the system can be extended and reconfigured. We describe our system model and implementation and compare our design to the design of other systems. (Also cross-referenced as UMIACS-TR-98-36) University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
Enhancing Automatic Acquisition of Thematic Structure in a Large-Scale. Mari Broman Olsen. Bonnie Dorr. Scott Thomas. June 1998.
This paper describes a refinement to our procedure for porting lexical conceptual structure into new languages. Specifically we describe a two-step process for creating candidate thematic grids for Mandarin Chinese verbs, using the English verb heading the VP in the subdefinitions to separate senses, and roughly parsing the verb complement structure to match to our thematic structure templates. The procedure is part of a larger process of creating a usable lexicon for interlingual machine translation from a large on-line resource with both too much and too little information necessary for our system. (Also cross-referenced as UMIACS-TR-98-35) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Graphical Multiscale Web Histories: A Study of PadPrints. Ron R. Hightower. Laura T. Ring. Jonathan I. Helfman,. Benjamin B. Bederson. James D. Hollan. May 1998.
We have implemented a browser companion called PadPrints that dynamically builds a graphical history-map of visited web pages. PadPrints relies on Pad++, a zooming user interface (ZUI) development substrate, to display the history-map using minimal screen space. PadPrints functions in conjunction with a traditional web browser but without requiring any browser modifications. We performed two usability studies of PadPrints. The first addressed general navigation effectiveness. The second focused on history-related aspects of navigation. In tasks requiring returns to prior pages, users of PadPrints completed tasks in 61.2% of the time required by users of the same browser without PadPrints. We also observed significant decreases in the number of pages accessed when using PadPrints. Users found browsing with PadPrints more satisfying than using Netscape alone. (Also cross-referenced as UMIACS-TR-98-33) University of Marylamd Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Application Framework for Creating Simulation-Based Learning. Anne Rose. David Eckard. Gary W. Rubloff+. May 1998.
While there are numerous types of electronic learning environments including collaboratories, construction toolkits, systems with "scaffolding" and simulations, it is difficult to find authoring tools to build these systems. We have developed an application framework for constructing simulation-based learning environments called SimPLE (Simulated Processes in a Learning Environment). Environments developed with SimPLE use dynamic simulations and visualizations to represent realistic time-dependent behavior and are coupled with guidance material and other software aids that facilitate learning. The software architecture enables independent contributions from developers representing educational content (e.g., simulation models, guidance materials) and software development (e.g., user interface). We provide a user interface template and accompanying software aids to reduce the software development effort. (Also cross-referenced as UMIACS-TR-98-32) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn. Gayathri Chittiappa. A. Udaya Shankar. May 1998.
We develop a numerical evaluation method for adaptive hierarchical QoS routing, and demonstrate its viability by application to two networks. Our approach models aggregation and delayed feedback in a straightforward way, and is scalable to the large networks needed to evaluate hierarchical routing. Department of Computer Science, University of Maryland,
Evaluation of Tradeoffs in Resource Management Techniques for. Leana Golubchik. John C. S. Liu. Edmundo de Silva e Souza. H. Richard Gail. May 1998.
Many modern applications can benefit from sharing of resources such as network bandwidth, disk bandwidth, and so on. In addition, many information systems store (or would like to store) data that can be of use to many different classes of applications, e.g., digital libraries type systems. Part of the difficulty in efficient resource management of such systems can then occur when these applications have vastly different performance and quality-of-service (QoS) requirements as well as resource demand characteristics. In this work we present a performance study of a multimedia storage system which serves multiple types of workloads, specifically a mixture of real-time and non-real-time workloads, by allowing sharing of resources among these different workloads while satisfying their performance requirements and QoS constraints. The broad aim of this work is to examine the issues and tradeoffs associated with mixing multiple workloads on the same server to explore the possibility of maintaining reasonable performance and QoS requirements without having to partition the resources. The main contribution of this work is the exposition of the tradeoffs involved in resource management in such systems. Although many different resources can be considered, here we concentrate mostly on the I/O bandwidth resource. The performance metrics of interest are the mean and variance of the response time for the non-real-time applications and the probability of missing a deadline for the real-time applications. The increased use of buffer space resources is also considered as a tradeoff for improvements in the above stated performance metrics, i.e., response time and probability of missing deadlines. (Also cross-referenced as UMIACS-TR-98-30) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Fast Evaluation of Ensemble Transients of Large IP Networks. Catalin T. Popescu. A. Udaya Shankar. May 11, 1998.
We extend a numerical approximate solution method (the Z-iteration) to time-dependent open networks of M(t)/M(t)/1/$\infty$ and M(t)/M(t)/1/K queues, and apply the method to obtain transient performance metrics of large IP networks. The method generates a set of coupled differential equations, one for each queue in the network. The equations are numerically unstable under certain conditions (e.g., large bandwidths and buffers), and we present techniques to overcome this problem. The resulting numerical procedure is accurate and very fast. For example, a 20-second evolution for a 1000-node network with high-speed links ($\approx 10^4$packets/sec) and large buffers ($\approx 10^4$packets) was obtained in 18 minutes on an Ultra Sparc, whereas simulation would take days. Department of Computer Science, University of Maryland,
Data Object and Label Placement For Information Abundant Visualizations. Jia Li. Catherine Plaisant. Ben Shneiderman. August 1998.
Placing numerous data objects and their corresponding labels in limited screen space is a challenging problem in information visualization systems. Extending map-oriented techniques, this paper describes static placement algorithms and develops metrics (such as compactness and labeling rate) as a basis for comparison among these algorithms. A control panel facilitates user customization by showing the metrics for alternative algorithms. Dynamic placement techniques that go beyond map-oriented techniques demonstrate additional possibilities. User actions can lead to selective display of data objects and their labels. Department of Computer Science, University of Maryland,
A Comparative Study of Knowledge-Based Approaches for Cross-Language. Douglas W. Oard. Bonnie J. Dorr. Paul G. Hackett. Maria Katsova. April 1998.
Cross-language retrieval systems seek to use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, bilingual dictionaries and translation lexicons, for cross-language retrieval. We have implemented six query translation techniques that use bilingual dictionaries, one based on lexical-semantic analysis, and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better effectiveness than simple dictionary-based techniques, and that performing document translation rather than query translation may result in further improvements in retrieval effectiveness under some conditions. (Also cross-referenced as UMIACS-TR-98-27) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Investigating Reading Techniques for Framework Learning. Forrest Shull. Filippo Lanubile. Victor R. Basili. April 1998.
The empirical study described in this paper addresses software reading for construction: how application developers obtain an understanding of a software artifact for use in new system development. This study focuses on the processes developers would engage in when learning and using object-oriented frameworks. We analyzed 15 student software development projects using both qualitative and quantitative methods to gain insight into what processes occurred during framework usage. The contribution of the study is not to test predefined hypotheses but to generate well-supported hypotheses for further investigation. The main hypotheses we produce are that example-based techniques are well suited to use by beginning learners while hierarchy-based techniques are not because of a larger learning curve. Other more specific hypotheses are proposed and discussed. (Also cross-referenced as UMIACS-TR-98-26) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Infrastructure for Building Parallel Database Systems for. Chialin Chang. Alan Sussman. Joel Saltz. April 1998.
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. We also present some preliminary performance results comparing the implementation of a remote-sensing image database using the T2 services with a custom-built integrated implementation. (Also cross-referenced as UMIACS-TR-98-24) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Digital Dynamic Telepathology -- the Virtual Microscope. Asmara Afework,. Michael D. Beynon,. Fabian Bustamante,. Angelo Demarzo, M.D.,. Renato Ferreira,. Robert Miller, M.D.,. Mark Silberman, M.D.,. Joel Saltz, M.D., Ph.D.,. Alan Sussman, Ph.D.,. Hubert Tsang. March 1998.
The Virtual Microscope is being designed as an integrated computer hardware and software system that generates a highly realistic digital simulation of analog, mechanical light microscopy. We present our work over the past year in meeting the challenges in building such a system. The enhancements we made are discussed, as well as the planned future improvements. Performance results are provided that show that the system scales well, so that many clients can be adequately serviced by an appropriately configured data server. (Also cross-referenced as UMIACS-TR-98-23) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Eigenanalysis of Some Preconditioned Helmholtz Problems. Howard C. Elman. Dianne P. O'Leary. March 1998.
In this work we calculate the eigenvalues obtained by preconditioning the discrete Helmholtz operator with Sommerfeld-like boundary conditions on a rectilinear domain, by a related operator with boundary conditions that permit the use of fast solvers. The main innovation is that the eigenvalues for two and three-dimensional domains can be calculated exactly by solving a set of one-dimensional eigenvalue problems. This permits analysis of quite large problems. For grids fine enough to resolve the solution for a given wave number, preconditioning using Neumann boundary conditions yields eigenvalues that are uniformly bounded, located in the first quadrant, and outside the unit circle. In contrast, Dirichlet boundary conditions yield eigenvalues that approach zero as the product of wave number with the mesh size is decreased. These eigenvalue properties yield the first insight into the behavior of iterative methods such as GMRES applied to these preconditioned problems. (Also cross-referenced as UMIACS-TR-98-22) University of Maryland Institute for Adavcanced Computer Studies, Department of Computer Science, University of Maryland,
Emergent Patterns of Teaching/Learning in Electronic Classrooms. Ben Shneiderman. Ellen Yu Borkowski. Maryam Alavi. Kent Norman. July 1998.
Novel patterns of teaching/learning have emerged from faculty and students who use our three Teaching/Learning Theaters at the University of Maryland, College Park. These fully-equipped electronic classrooms have been used by 74 faculty in 264 semester-long courses since the Fall of 1991 with largely enthusiastic reception by both faculty and students. The designers of the Teaching/Learning Theaters sought to provide a technologically rich environment and a support staff so that faculty could concentrate on changing the traditional lecture from its unidirectional information flow to a more collaborative activity. As faculty evolved their personal styles in using the electronic classrooms, novel patterns of teaching/learning have emerged. In addition to enhanced lectures, we identified three common patterns: active individual learning, small-group collaborative learning, and entire-class collaborative learning. Department of Computer Science, University of Maryland,
Chapter 3: Children as Our Technology Design Partners+. Allison Druin. Ben Bederson. Angela Boltman. Adrian Miura. Debby Knotts-Callahan. Mark Platt. March 1998.
"That's silly!" "I'm bored!" "I like that!" "Why do I have to do this?" "What is this for?" These are all important responses and questions that come from children. As our design partners in developing new technologies, children can offer bluntly honest views of their world. They have their own likes, dislikes, and needs that are not the same as adults' (Druin, Stewart, Proft, Bederson, & Hollan, 1997). As the development of new technologies for children becomes commonplace in industry and university research labs, children's input into the design and development process is critical. We need to establish new development methodologies that enable us to stop and listen, and learn to collaborate with children of all ages. In the chapter that follows, a discussion of new research methodologies will be presented. (Also cross-referenced as UMIACS-TR-98-20) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Building Self-Reconfiguring Distributed Virtual Environments. Donald J. Welch. March 2998.
A distributed virtual environment may be required to reconfigure itself to compensate for various conditions that can occur during execution. An example is the reentry of a virtual environment that was previously reconfigured out of the distributed virtual environment due to failure. If there is a human user of this virtual environment, care must be taken to insure that he is brought back into the distributed virtual environment in a way that makes sense. He cannot regain control of a tank that is out of ammunition while a computer-based simulation controls actively participating tanks. The compensating reconfiguration function of a distributed virtual environment must detect conditions that dictate reconfiguration. It must determine the proper course of action and act on it, bringing the distributed virtual environment to a stable state as quickly as possible. Proper reconfiguration of a distributed virtual environment requires that the compensating reconfiguration software know the system configuration, the virtual state, and the mapping between them. Building compensating reconfiguration software using traditional means is laborious and error prone. A rule-based tool that uses abstract views of the distributed virtual environment is a better way to produce compensating reconfiguration software. To show the viability of this approach I have developed a rule-based tool called Bullpen. This research compares Bullpen against manual coding in a case study that ranges over a wide array of requirements changes. The results of this case study show that using Bullpen to build compensating reconfiguration components is superior to manually building the software in the kind of environments most commonly found in the military DVE domain. Using Bullpen takes less effort and is less complex than using manual programming techniques. The resulting component is less error prone and has acceptable reaction time. (Also cross-referenced as UMIACS-TR-98-18) University of Maryland Institute for Advanced Computer Studies), Department of Computer Science, University of Maryland,
Parametric Design Synthesis of Distributed Embedded Systems. Dong-In Kang. Richard Gerber. Manas Saksena. 3/12/98.
This paper presents a design synthesis method for distributed embedded systems. In such systems, computations can flow through long pipelines of interacting software components, hosted on a variety of resources, each of which is managed by a local scheduler. Our method automatically calibrates the local resource schedulers to achieve the system's global end-to-end performance requirements. A system is modeled as a set of distributed task chains (or pipelines), where each task represents an activity requiring nonzero load from some CPU or network resource. Task load requirements can vary stochastically, due to second-order effects like cache memory behavior, DMA interference, pipeline stalls, bus arbitration delays, transient head-of-line blocking, etc. We aggregate these effects -- along with a task's per-service load demand -- and model them via a single random variable, ranging over an arbitrary discrete probability distribution. Load models can be obtained via profiling tasks in isolation, or simply by using an engineer's hypothesis about the system's projected behavior. The end-to-end performance requirements are posited in terms of throughput and delay constraints. Specifically, a pipeline's delay constraint is an upper bound on the total latency a computatation can accumulate, from input to output. The corresponding throughput constraint mandates the pipeline's minimum acceptable output rate -- counting only outputs which meet their delay constraints. Since per-component loads can be generally distributed, and since resources host stages from multiple pipelines, meeting all of the system's end-to-end constraints is a nontrivial problem. Our approach involves solving two sub-problems in tandem: (A)~finding an optimal proportion of load to allocate each task and channel; and (B)~deriving the best combination of service intervals over which all load proportions can be guaranteed. The design algorithms use analytic approximations to quickly estimate output rates and propagation delays for candidate solutions. When all parameters are synthesized, the estimated end-to-end performance metrics are re-checked by simulation. The per-component load reservations can then be increased, with the synthesis algorithms re-run to improve performance. At that point the system can be configured according to the synthesized scheduling parameters -- and then re-validated via on-line profiling. In this paper we demonstrate our technique on an example system, and compare the estimated performance to its simulated on-line behavior. (Also cross-referenced as UMIACS-TR-98-18) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland Institute,
Hybrid Probabilistic Programs. Alex Dekhtyar. V. S. Subrahmanian. March 1998.
The precise probability of a compound event (e.g. e1 v e2, e1 ^ e2) depends upon the known relationships (e.g. independence, mutual exclusion, ignorance of any relationship, etc.) between the primitive events that constitute the compound event. To date, most research on probabilistic logic programming [20, 19, 22, 23, 24] has assumed that we are ignorant of the relationship between primitive events. Likewise, most research in AI (e.g. Bayesian approaches) have assumed that primitive events are independent. In this paper, we propose a hybrid probabilistic logic programming language in which the user can explicitly associate, with any given probabilistic strategy, a conjunction and disjunction operator, and then write programs using these operators. We describe the syntax of hybrid probabilistic programs, and develop a model theory and fixpoint theory for such programs. Last, but not least, we develop three alternative procedures to answer queries, each of which is guaranteed to be sound and complete. (Also cross-referenced as UMIACS-TR-98-16) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Heterogeneous Active Agents. V.S. Subrahmanian. Thomas Eiter. George Pick. March 1998.
Over the years, many different agent programming languages have been proposed. In this paper, we propose a concept called Agent Programs using which, the way an agent should act in various situations can be declaratively specified by the creator of that agent. Agent Programs may be built on top of arbitrary pieces of software code and may be used to specify what an agent is obliged to do, what an agent may do, and what an agent may not do. In this paper, we define several successively more sophisticated and epistemically satisfying declarative semantics for agent programs, and study the computation price to be paid (in terms of complexity) for such epistemic desiderata. We further show that agent programs cleanly extend well understood semantics for logic programs, and thus are clearly linked to existing results on logic programming and nonmonotonic reasoning. Last, but not least, we have built a simulation of a Supply Chain application in terms of our theory, building on top of commercial software systems such as Microsoft Access and ESRI's Map Object. (Also cross-referenced as UMIACS-TR-98-15) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Facilitating Network Data Exploration with Query Previews: A Study of. Egemen Tanin. Amnon Lotem. Ihab Haddadin. Ben Shneiderman. Catherine Plaisant. Laura Slaughter. February 1998.
Current network data exploration systems which use command languages (e.g. SQL) or form fill-in interfaces fail to give users an indication of the distribution of data items. This leads many users to waste time posing queries which have zero-hit or mega-hit result sets. Query previewing is a novel visual approach for browsing huge networked information warehouses. Query previews supply data distribution information about the database that is being searched and give continuous feedback about the size of the result set for the query as it is being formed. Our within-subjects empirical comparison studied 12 subjects using a form fill-in interface with and without query previews. We found statistically significant differences showing that query previews sped up performance 1.6 to 2.1 times and led to higher subjective satisfaction. (Also cross-referenced as UMIACS-98-14) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Reduction of Materialized View Staleness Using Online Updates. Alexandros Labrinidis. Nick Roussopoulos. February 1998.
Updating the materialized views stored in data warehouses usually implies making the warehouse unavailable to users. We propose MAUVE, a new algorithm for online incremental view updates that uses timestamps and allows consistent read-only access to the warehouse while it being updated. The algorithm propagates the updates to the views more often than the typical once a day in order to reduce view staleness. We have implemented MAUVE top of the Informix Universal Server and used a synthetic workload generator to experiment with various update workloads and different view update frequencies. Our results show that, all kinds of update streams benefit from more frequent view updates, instead of just once a day. However, there is a clear maximum for the view update frequency, for which view staleness is minimal. (Also cross-referenced as UMIACS-TR-98-13) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Two Algorithms for the The Efficient Computation of Truncated Pivoted. G. W. Stewart. February 1998.
In this note we propose two algorithms to compute truncated pivoted QR approximations to a sparse matrix. One is based on the Gram--Schmidt algorithm, and the other on Householder triangularization. Both algorithms leave the original matrix unchanged, and the only additional storage requirements are arrays to contain the factorization itself. Thus, the algorithms are particularly suited to determining low-rank approximations to a sparse matrix. (Also cross-referenced as UMIACS-TR-98-12) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Analysis and Applications of Receptive Safety Properties in Concurrent. Gilberto Matos. February 1998.
Formal verification for complex concurrent systesm is a computationally intensive and in some cases, intractable process. The compexity is an inherent part of the verification process due to the system complexity that is an exponential function of the sizes of its components. However, some properties can be enforced by atuomatically synchronizing the components, thus eliminating the need for verfication. Moreover, the complexity of the analysis required to enforce the properties grows incrementally with addition of new components and properties that make the system complexity grow exponentially. The properties in question are the receptive safety properties, a subset of safety properties that can only be violated by component actions. The receptive safety properties represent the realizable subset of the gerneral safety properties because a system that satisfies any non-receptive safety properties mst satisfy related receptive safety properties. This implies that any system with realizable safety requirements can be described as a set of components and receptive safety properties that specify the component interaction that satisfies the requirements. We have developed a methos that automaticaly synchronizes complex concurrent systems to enforce their receptive safety propeties. Many non-safety properties, and automated synchronization can be used to enforce them. (Also cross-referenced as UMIACS-TR-98-11) University of Maryland Institute for Advanced Computer Studies, Departmen tof Computer Science, University of Maryland,
Cyclone Technology: An Overview. Sung Lee. February 1998.
The current network, which is based on managing resources on demand and accepting uncontrolled communication request, often leads to problems such as congestion and other queueing bottlenecks. The extent of congestion and queues depends on the variability in customer arrival times, services needed, and the resource allocation mechanism used by system components. The queue sizes, which results in congestion, can be reduced only by controlling the variability in customer arrival times, and this is best done by making explicit use of time. Cyclone technology uses the information based on times of events explicitly, including the design of systems. Cyclone provides the coordination of resources through dynamic, time-based resource management leading to a network that is capable of providing end-to-end low latency communications free of losses, jitter, (Also cross-referenced as UMIACS-TR-98-10) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Interfaces and Tools for the Library of Congress National Digital. Gary Marchionini. Catherine Plaisant. Anita Komlodi. February 1998.
This paper describes a collaborative effort to explore user needs in a digital library, develop interface prototypes for a digital library, and suggest and prototype tools for digital librarians and users at the Library of Congress (LC). Interfaces were guided by an assessment of user needs and aimed to maximize interaction with primary resources and support both browsing and analytical search strategies. Tools to aid users and librarians in overviewing collections, previewing objects, and gatherin g results were created and serve as the beginnings of a digital librarian toolkit. The design process and results are described and suggestions for future work are offered. (Also cross-referenced as UMIACS-TR-98-09) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Composite Model Checking with Type Specific Symbolic Encodings. Tevfik Bultan. Richard Gerber. February 1998.
We present a new symbolic model checking technique, which analyzes temporal properties in multi-typed transition systems. Specifically, the method uses multiple type-specific data encodings to represent system states, and it carries out fixpoint computations via the corresponding type-specific symbolic operations. In essence, different symbolic encodings are unified into one composite model checker. Any type-specific language can be included in this framework -- provided that the language is closed under Boolean connectives, propositions can be checked for satisfiability, and relational images can be computed. Our technique relies on conjunctive partitioning of transition relations of atomic events based on variable types involved, which allows independent computation of one-step pre- and post-conditions for each variable type. In this paper we demonstrate the effectiveness of our method on a nontrivial data-transfer protocol, which contains a mixture of integer and Boolean-valued variables. The protocol operates over an unreliable channel that can lose, duplicate or reorder messages. Moreover, the protocol's send and receive window sizes are not specified in advance; rather, they are represented as symbolic constants. The resulting system was automatically verified using our composite model checking approach, in concert with a conservative approximation technique. (Also cross-referenced as UMIACS-TR-98-07) University of Maryland Institure for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Model Checking Concurrent Systems with Unbounded Integer Variables:. Tevfik Bultan. Richard Gerber. William Pugh. February 1998.
Model checking is a powerful technique for analyzing large, finite-state systems. In an infinite-state system, however, many basic properties are undecidable. In this paper, we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically encode a program's transition system, as well as its model-checking computations. All fixpoint calculations are executed symbolically, and their convergence is guaranteed by using approximation techniques. We demonstrate the promise of this technology on some well-known infinite-state concurrency problems. (Also cross-referenced as UMIACS-TR-98-07) University of Maryland Institure for Advanced Computer Studies, Department of Computer Science, University of Maryland,
T2: A Customizable Parallel Database For Multi-dimensional Data. Chialin Chang. Anurag Acharya. Alan Sussman. Joel Saltz. January 1998.
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. (Also cross-referenced as UMIACS-TR-98-04) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Adjoint Matrix. G. W. Stewart. January 1998.
The adjoint $A\adj$ of a matrix $A$ is the transpose of the matrix of the cofactors of the elements of $A$. The computation of the adjoint from its definition involves the computation of $n^{2}$ determinants of order $(n-1)$\,---\,a prohibitively expensive $O(n^{4})$ process. On the other had the computation from the formula $A\adj = \det(A)A\inv$ breaks down when $A$ is singular and is potentially unstable when $A$ is ill-conditioned. In this paper we first show that the ajdoint can be perfectly conditioned, even when $A$ is ill-conditioned. We then show that if due care is taken the adjoint can be accurately computed from the inverse, even when the latter has been inaccurately computed. In an appendix to this paper we establish a folk result on the accuracy of computed inverses. \end{minipage} \end{center} Also cross-referenced as UMIACS-TR-98-02 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Codex, Memex, Genex: The pursuit of transformational technologies. Ben Shneiderman. December 1997.
Handwritten codexes or printed books transformed society by allowing users to preserve and transmit information. Today, leather-bound volumes and illuminated manuscripts are giving way to animated image maps and hot links. Vannevar Bush's memex has inspired the World Wide Web, which provides users with vast information resources and convenient communications. In looking to the future, we might again transform society by building genexes -- generators of excellence. Such inspirational environments would empower personal and collaborative creativity by enabling users to: - collect information from an existing domain of knowledge, - create innovations using advanced tools, - consult with peers or mentors in the field, and then - disseminate the results widely. This paper describes how a framework for an integrated set of software tools might support this four-phase model of creativity in science, medicine, the arts, and beyond. Current initiatives are positive and encouraging, but they do not work in an integrated fashion, often miss vital components, and are frequently poorly designed. A well-conceived and clearly-stated framework could guide design efforts, coordinate planning, and speed development. (Also cross-referenced as UMIACS-TR-97-89) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On-Demand Broadcast Scheduling. Demet Aksoy. Michael Franklin. December 1998.
Broadcast is becoming an increasingly attractive data dissemination method for large client populations. In order to effectively utilize a broadcast medium for such a service, it is necessary to have efficient, on-line scheduling algorithms that can balance individual and overall performance, and can scale in terms of data set sizes, client populations, and broadcast bandwidth. We propose an algorithm, called RxW, that provides good performance across all of these criteria and that can be tuned to trade off average and worst case waiting time. Unlike previous work on low overhead scheduling, the algorithm does not use estimates of the access probabilities of items, but rather, it makes scheduling decisions based on the current queue state, allowing it to easily adapt to changes in the intensity and distribution of the workload. We demonstrate the performance advantages of the algorithm under a range of scenarios using a simulation model and present analytical results that describe the intrinsic behavior of the algorithm. (Also cross-referenced as UMIACS-TR-98-88) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Toward Compact Monotonically Compositional Interlingua Using Lexical Aspect. Bonnie J. Dorr. Mari Broman Olsen. Scott C. Thomas. December 1997.
We describe a theoretical investigation into the semantic space described by our interlingua (IL), which currently has 191 main verb classes divided into 434 subclasses, represented by 237 distinct Lexical Conceptual Structures (LCSs). Using the model of aspect in Olsen (1994b, 1997a)---monotonic aspectual composition---we have identified 71 aspectually basic subclasses that are associated with one or more of 68 aspectually non-basic classes via some lexical (``type-shifting'') rule (Bresnan 1982, Pinker 1984, Levin and Rappaport Hovav 1995). This allows us to refine the IL and address certain computational and theoretical issues at the same time. (1) >From a linguistic viewpoint, the expected benefits include a refinement of the aspectual model in (Olsen:1994b, 1997a) (which provides necessary but not sufficient conditions for aspectual composition), and a refinement of the verb classifications in (Levin 1993); we also expect our approach to eventually produce a systematic definition (in terms of LCSs and compositional operations) of the precise meaning components responsible for Levin's classification. (2) Computationally, the lexicon is made more compact. Also cross-referenced as UMIACS-TR-97-86 Also cross-referenced as LAMP-TR-012 University of Maryland Institute for Advanced Computer Studies, University of Maryland Laboratory for Language and Media Processing, Department of Computer Science, University of Maryland,
Using WordNet to Posit Hierarchical Structure in Levin's Verb Classes. Mari Broman Olsen. Bonnie J. Dorr. David J. Clark. December 1997.
In this paper we report on experiments using WordNet synset tags to evaluate the semantic properties of the verb classes cataloged by Levin 1993. This paper represents ongoing research begun at the University of Pennsylvania (Rosenzweig et al. 1997, Palmer et al. 1997) and the University of Maryland (Dorr and Jones 1996b, 1996d, 1996e). Using WordNet sense tags to constrain the intersection of Levin classes, we avoid spurious class intersections introduced by homonymy and polysemy (_run a bath, run a mile_). By adding class intersections based on a single shared sense-tagged word, we minimize the impact of the non-exhaustiveness of Levin's database (Dorr and Olsen 1996, Dorr to appear). By examining the syntactic properties of the intersective classes, we provide a clearer picture of the relationship between WordNet/EuroWordNet and the LCS interlingua for machine translation and other NLP applications. Also cross-referenced as UMIACS-TR-97-85 Also cross-referenced as LAMP-TR-011 University of Maryland Institute for Advanced Computer Studies, University of Maryland Laboratory for Language and Media Processing, Department of Computer Science, University of Maryland,
The End of Zero-Hit Queries: Query Previews for NASA's Global Change. Stephan Greene. Egemen Tanin. Catherine Plaisant. Ben Schneiderman. Lola Olsen. Gene Major. Steve Johns. December 1997.
The Human-Computer Interaction Laboratory (HCIL) of the University of Maryland and NASA have collaborated over the last three years to refine and apply user interface research concepts developed at HCIL in order to improve the usability of NASA data services. The research focused on dynamic query user interfaces, visualization, and overview +preview designs. An operational prototype, using query previews, was implemented with NASA's Global Change Master Directory (GCMD), a directory service for earth science data sets. Users can see the histogram of the data distribution over several attributes and choose among attribute values. A result bar shows the cardinality of the result set, thereby preventing users from submitting queries that would have zero hits. Our experience confirmed the importance of metadata accuracy and completeness. The query preview interfaces make visible problems or holes in the metadata that are unnoticeable with classic form fill-in interfaces. This could be seen as a problem, but we think that it will have a long-term beneficial effect on the quality of the metadata as data providers will be compelled to produce more complete and accurate metadata. The adaptation of the research prototype to the NASA data required revised data structures and algorithms. (Also cross-referenced as UMIACS-TR-97-84) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
An Information Architecture to Support the Visualization of Personal. Catherine Plaisant. Ben Schneiderman. December 1997.
Also cross-referenced as UMIACS-TR-97-87 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Taxonomy of Multiple Window Coordinations. Chris North. Ben Schneiderman. December 1997.
Handwritten codexes or printed books transformed society by allowing users to preserve and transmit information. Today, leather-bound volumes and illuminated manuscripts are giving way to animated image maps and hot links. Vannevar Bush's memex has inspired the World Wide Web, which provides users with vast information resources and convenient communications. In looking to the future, we might again transform society by building genexes -- generators of excellence. Such inspirational environments would empower personal and collaborative creativity by enabling users to: collect information from an existing domain of knowledge, create innovations using advanced tools, consult with peers or mentors in the field, and then disseminate the results widely. This paper describes how a framework for an integrated set of software tools might support this four-phase model of creativity in science, medicine, the arts, and beyond. Current initiatives are positive and encouraging, but they do not work in an integrated fashion, often miss vital components, and are frequently poorly designed. A well-conceived and clearly-stated framework could guide design efforts, coordinate planning, and speed development. (Also cross-referenced as UMIACS-TR-97-83) University of Maryland Institute for Advanced Computer Studies, University of Maryland Institute for Systems Research, Department of Computer Science, University of Maryland,
An Approach to Improve Existing Measurement Frameworks in Software. Manoel Gomes Mendonca. December 1997.
Measurement is a key mechanism to characterize, evaluate, and improve software development, management, and maintenance processes. Nowadays, software organizations use metrics for very different purposes. Data is collected to describe, monitor, understand, assess, compare, validate, and appraise very diverse attributes related to software processes or products. Improving data collection and better using the existing data are important problems for software organizations. This dissertation proposes an approach for improving measurement and data use when a large number of diverse metrics are already being collected by a software organization. The approach combines two methods. One looks at an organization's measurement framework in a top-down fashion and the other looks at it in a bottom-up fashion. The top-down method, based on the Goal-Question-Metric (GQM) Paradigm, is used to identify the measurement goals of data users and map them to the metrics being used by the organization. This allows the measurement practitioners to: (1)~identify which metrics are and are not useful to the organization; and (2)~check if the goals of data user groups can be satisfied by the data that is being collected by the organization. The bottom-up method is based on a data mining technique called Attribute Focusing (AF). It is used to identify useful information in the existing data that the data users were not aware of. To validate the approach and to assess its usefulness, a case study was performed in a real industrial environment. The top-down and bottom-up methods were applied in the customer satisfaction measurement framework at the IBM Toronto Laboratory. The top-down method was applied to improve the customer satisfaction (CUSTSAT) measurement from the point of view of three data user groups. The bottom-up method was used to gain new insights into the existing CUSTSAT data. The top-down method identified several new metrics for the interviewed user groups. It also contributed to better understanding the data user needs and led to modification of some of the data analyses and presentations done for those groups. The bottom-up method produced important insights on both the customer satisfaction domain and the measurement framework itself. Unexpected associations between key variables prompted new insights on their importance for the organization. Some of these associations have also revealed problems with the metrics being used to collect the data. (Also cross-referenced as UMIACS-TR-97-82) University of Maryland Institute of Advanced Computer Studies, Department of Computer Science, University of Maryland,
Symmetric Cauchy-like Preconditioners for the Regularized Solution of. Misha E. Kilmer. December 1997.
The discretization of integral equations can lead to systems involving symmetric Toeplitz matrices. We describe a preconditioning technique for the regularized solution of the related discrete ill-posed problem. We use discrete sine transforms to transform the system to one involving a Cauchy-like matrix. Based on the approach of Kilmer and O'Leary, the preconditioner is a symmetric, rank $m^{*}$ approximation to the Cauchy-like matrix augmented by the identity. We shall show that if the kernel of the integral equation is smooth then the preconditioned matrix has two desirable properties; namely, the largest $m^{*}$ magnitude eigenvalues are clustered around and bounded below by one, and that small magnitude eigenvalues remain small. We also show that the initialization cost is less than the initialization cost for the preconditioner introduced by Kilmer and O'Leary. Further, we describe a method for applying the preconditioner in $O((n+1) \lg (n+1))$ operations when $n+1$ is a power of 2, and describe a variant of the MINRES algorithm to solve the symmetrically preconditioned problem. The preconditioned method is tested on two examples. Department of Computer Science, University of Maryland, Applied mathematics Program, University of Maryland,
Dynamic Time-Based Scheduling for Hard Real-Time Systems. Seonho Choi. December 1997.
In traditional time-based scheduling schemes for real-time systems time line is explicitly managed to obtain a feasible schedule that satisfies all timing constraints. In the schedule the task attributes, such as task start time, are statically decided off-line and used without modification throughout system operation time. However, for dynamic real-time systems, in which new tasks may arrive during the operation, or tasks may have relative constraints based on information only known at run-time, such static schemes may lack the ability to accommodate dynamic changes. Clearly a solution of dynamic real-time scheduling has to reflect the knowledge about tasks and their execution characteristics. In this dissertation we present a {\em dynamic time-based scheduling scheme} and show its applicability for three problem domains. In dynamic time-based scheduling scheme attributes of task instances in the schedule may be represented as functions parameterized with information available at task dispatching time. These functions are called {\em attribute functions} and may denote any attribute of a task instance, such as lower and upper bound of its start time, its execution mode, etc. Flexible resource management becomes possible in this scheme by utilizing the freedom provided by the scheme. First, we study the problem of dynamic dispatching of tasks, reflecting relative timing constraints among tasks. The relative constraints may be defined across the boundary of two consecutive scheduling windows as well as within one scheduling window. We present the solution approach with which we are not only able to test the schedulability of a task set, but also able to obtain maximum slack time by postponing static task executions at run-time. Second, new framework is formulated for designing real-time control systems in which the assumption of fixed sampling period is relaxed. That is, sampling time instants are found adaptively based on physical system state such that a new cost function value is minimized which incorporates computational costs. We show, for linear time-invariant control systems, that the computation requirement can be reduced while maintaining the quality of control. Third, acceptance tests are found for dynamically arriving aperiodic tasks, and for dynamically arriving sporadic tasks, respectively, under the assumption that an Earliest Deadline First scheduling policy is used for resolving resource contention between dynamic and static(dynamic) tasks. Dynamic time-based scheduling scheme can be applied as solution approaches for these problems as will be shown in this dissertation, and its effectiveness will be demonstrated. Also cross-referenced as UMIACS-TR-97-81 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Improved Methods for Approximating Node Weighted Steiner Trees and. Sudipto Guha. Samir Khuller. December 1997.
A greedy approximation algorithm based on ``spider decompositions'' was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio of $2 \ln k$, where $k$ is the number of terminals. However, the best known lower bound on the approximation ratio is $\ln k$, assuming that $NP \not\subseteq DTIME[n^{O(\log \log n)}]$, by a reduction from set cover. We show that for the unweighted case we can obtain an approximation factor of $\ln k$. For the weighted case we develop a new decomposition theorem, and generalize the notion of ``spiders'' to ``branch-spiders'', that are used to design a new algorithm with a worst case approximation factor of $1.5 \ln k$. This algorithm, although polynomial, is not very practical due to its high running time; since we need to repeatedly find many minimum weight matchings in each iteration. We are able to generalize the method to yield an approximation factor approaching $1.35 \ln k$. We also develop a simple greedy algorithm that is practical and has a worst case approximation factor of $1.6103 \ln k$. The techniques developed for the second algorithm imply a method of approximating node weighted network design problems defined by 0-1 proper functions. These new ideas also lead to improved approximation guarantees for the problem of finding a minimum node weighted connected dominating set. The previous best approximation guarantee for this problem was $3 \ln n$. By a direct application of the methods developed in this paper we are able to develop an algorithm with an approximation factor approaching $1.35 \ln n$. (Also cross-referenced as UMIACS-TR-97-80) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Applying Traversal-Pattern-Sensitive Pointer Analysis to Dependence Analysis. Yuan-Shin Hwang. Joel Saltz. November 1997.
This paper presents a technique for dependence analysis on programs with pointers or dynamic recursive data structures. It differs from previously proposed approaches in analyzing structure access conflicts between traversal patterns before gathering alias and connection information. Conflict analysis is conducted under the assumption that each unique path leads to a distinct storage location, and hence traversal patterns can be analytically compared to identify possible conflicts. The rationale of this assumption is that if statements are deemed to be dependent by this approach, they are inherently sequential regardless of the shapes of the data structures they traverse. Consequently, there is no need to perform alias/connection analysis on the statements that construct such data structures. Furthermore, the information of traversal patterns gathered in conflict analysis phase can direct alias/connection analysis algorithm to focus on statements that are crucial to optimizations or parallelization. A such {\em traversal-pattern-sensitive} pointer analysis algorithm will also be presented. Department of Computer Science, University of Maryland,
Algorithms for Capacitated Vehicle Routing. Moses Charikar. Samir Khuller. Balaji Raghavachari. November 1997.
Given $n$ identical objects (pegs), placed at arbitrary initial locations, we consider the problem of transporting them efficiently to $n$ target locations (slots) with a vehicle that can carry at most $k$ pegs at a time. This problem is referred to as $k$-delivery TSP, and it is a generalization of the Traveling Salesman Problem. We give a 5-approximation algorithm for the problem of minimizing the total distance traveled by the vehicle. There are two kinds of transportations possible --- one that could drop pegs at intermediate locations and pick them up later in the route for delivery (preemptive) and one that transports pegs to their targets directly (non-preemptive). In the former case, by exploiting the freedom to drop, one may be able to find a shorter delivery route. We construct a non-preemptive tour that is within a factor 5 of the optimal preemptive tour. In addition we show that the ratio of the distances traveled by an optimal non-preemptive tour versus a preemptive tour is bounded by 4. (Also cross-referenced as UMIACS-TR-97-79) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Neural Learning of Chaotic Dynamics: The Error Propagation Algorithm. Rembrandt Bakker. Jaap C. Schouten. Cor M. van den Bleek. C. Lee Giles. October 1997.
An algorithm is introduced that trains a neural network to identify chaotic dynamics from a single measured time-series. The algorithm has four special features: 1. The state of the system is extracted from the time-series using delays, followed by weighted Principal Component Analysis (PCA) data reduction. 2. The prediction model consists of both a linear model and a Multi- Layer-Perceptron (MLP). 3. The effective prediction horizon during training is user-adjustable due to error propagation: prediction errors are partially propagated to the next time step. 4. A criterion is monitored during training to select the model that as a chaotic attractor is most similar to the real system attractor. The algorithm is applied to laser data from the Santa Fe time-series competition (set A). The resulting model is not only useful for short-term predictions but it also generates time-series with similar chaotic characteristics as the measured data. _Also cross-referenced as UMIACS-TR-97-77) University of Maryland Institute for Advanced Computer Studies, Delft University of Technology, Department of Chemical Process, NEC Research Institute,
A Generalized Framework for Indexing OLAP Aggregates. Yannis Kotidis. October 1997.
Decision support applications often require fast response time to a wide variety of aggregate queries extracted from huge amounts of data. In this paper we propose the use of well organized packed R-trees for storing and maintaining multidimensional aggregates. Moreover, we present a general framework for mapping OLAP data to a collection of R-trees that achieve a high degree of data clustering with very low space overhead. We then propose four different allocation strategies designed to optimize different application needs. On the second part of the paper we present experimental results on high dimensionality OLAP data (up to 10 dimensions) of realistic size. Finally we characterize the performance of the proposed allocation strategies with respect to both incremental updates and response time for a variety of different queries. (Also cross-referenced as UMIACS-TR-97-76) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On an Inexpensive Triangular Approximation to the Singular Value. G. W. Stewart. October 1997.
In this paper we introduce a new decomposition called the pivoted QLP~decomposition. It is computed by applying pivoted orthogonal triangularization to the columns of the matrix $X$ in question to get an upper triangular factor $R$ and then applying the same procedure to the rows of $R$ to get a lower triangular matrix $L$. The diagonal elements of $R$ are called the R-values of $X$; those of $L$ are called the L-values. Numerical examples show that the L-values track the singular values of $X$ with considerable fidelity\,---\,far better than the R-values. At a gap in the L-values the decomposition provides orthonormal bases of analogues of row, column, and null spaces provided of $X$. The decomposition requires no more than twice the work required for a pivoted QR~decomposition. The computation of $R$ and $L$ can be interleaved, so that the computation can be the rows of $R$ to get a lower triangular matrix $L$. The diagonal elements of $R$ are called the R-values of $X$; those of $L$ are called the L-values. Numerical examples show that the L-values track the singular values of $X$ with considerable fidelity\,---\,far better than the R-values. At a gap in the L-values the decomposition provides orthonormal bases of analogues of row, column, and null spaces provided of $X$. The decomposition requires no more than twice the work required for a pivoted QR~decomposition. The computation of $R$ and $L$ can be interleaved, so that the computation can be terminated at any suitable point, which makes the decomposition especially suitable for low-rank determination problems. The interleaved algorithm also suggests a new, efficient 2-norm estimator. (Also cross-referenced as UMIACS-TR-97-75) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
On the Convergence of a New Rayleigh Quotient Method with Applications. D. P. O'Leary. G. W. Stewart. October 1997.
In this paper we propose a variant of the Rayleigh quotient method to compute an eigenvalue and corresponding eigenvectors of a matrix. It is based on the observation that eigenvectors of a matrix with eigenvalue zero are also singular vectors corresponding to zero singular values. Instead of computing eigenvector approximations by the inverse power method, we take them to be the singular vectors corresponding to the smallest singular value of the shifted matrix. If these singular vectors are computed exactly the method is quadratically convergent. However, exact singular vectors are not required for convergence, and the resulting method combined with Golub--Kahan--Krylov bidiagonalization looks promising for enhancement/refinement methods for large eigenvalue problems. (Also cross-referenced as UMIACS-97-74) Institute for Advanced Computer Studies, University of Maryland, Department of Computer Science, University of Maryland,
Previews and Overviews in Digital Libraries: Designing Surrogates to. Stephan Greene. Gary Marchionini. Catherine Plaisant. Ben Shneiderman. September 1997.
To aid designers of digital library interfaces and web sites in creating comprehensible, predictable and controllable environments for their users, we define and discuss the benefits of previews and overviews as visual information representations. Previews and overviews are graphic or textual representations of information abstracted from primary information objects. They serve as surrogates for those objects. When utilized properly, previews and overviews allow users to rapidly discriminate objects of interest from those not of interest, and to more fully understand the scope and nature of large collections of information resources. We provide a more complete definition of previews and overviews, and discuss system parameters and aspects of primary information objects relevant to designing effective preview and overviews. Finally, we present examples that illustrate the use of previews and overviews and offer suggestions for designers. (Also cross-referenced as UMIACS-TR-97-73) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Modified Streamline Diffusion Schemes for Convection-Diffusion. H. C. Elman. Y.-T. Shih. October 1997.
We consider the design of robust and accurate finite element approximation methods for solving convection--diffusion problems. We develop some two--parameter streamline diffusion schemes with piecewise bilinear (or linear) trial functions and show that these schemes satisfy the necessary conditions for $L^{2}$-uniform convergence of order greater than $1/2$ introduced by Stynes and Tobiska. For smooth problems, the schemes satisfy error bounds of the form $O(h)|u|_{2}$ in an energy norm. In addition, extensive numerical experiments show that they effectively reproduce boundary layers and internal layers caused by discontinuities on relatively coarse grids, without any requirements on alignment of flow and grid. (Also cross-referenced as UMIACS-TR-97-71) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Facility Location with Dynamic Distance Functions. . Randeep Bhatia. Sudipto Guha. Samir Khuller. Yoram J. Sussmann. October 1997.
Facility location problems have always been studied with the assumption that the edge lengths in the network are {\em static} and do not change over time. The underlying network could be used to model a city street network for emergency facility location/hospitals, or an electronic network for locating information centers. In any case, it is clear that due to traffic congestion the traversal time on links {\em changes} with time. Very often, we have some estimates as to how the edge lengths change over time, and our objective is to choose a set of locations (vertices) as centers, such that at {\em every} time instant each vertex has a center close to it (clearly, the center close to a vertex may change over time). We also provide approximation algorithms as well as hardness results for the $K$-center problem under this model. This is the first comprehensive study regarding approximation algorithms for facility location for good time-invariant solutions. (Also cross-references as UMIACS-TR-97-70) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Sorting on Clusters of SMPs. David R. Helman. Joseph Ja'Ja'. November 1997.
Clusters of symmetric multiprocessors (SMPs) have emerged as the primary candidates for large scale multiprocessor systems. In this paper, we introduce an efficient sorting algorithm for clusters of SMPs. This algorithm relies on a novel scheme for stably sorting on a single SMP coupled with balanced regular communication on the cluster. Our SMP algorithm seems to be asymptotically faster than any of the published algorithms we are aware of. The algorithms were implemented in C using Posix Threads and the SIMPLE library of communication primitives and run on a cluster of DEC AlphaServer 2100A systems. Our experimental results verify the scalability and efficiency of our proposed solution and illustrate the importance of considering both memory hierarchy and the overhead of shifting to multiple nodes. (Also cross-reference as UMIACS-TR-97-69 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Building an Electronic Learning Community: From Design to. Anne Rose. Wei Ding. Gary Marchionini. Josephus Beale, Jr.. Victor Nolet. September 1997.
The University of Maryland at College Park in cooperation with Baltimore City Public Schools and several partners is working to build an electronic learning community that provides teachers with multimedia resources that are linked to outcome-oriented curriculum guidelines. The initial resource library contains over 1000 videos, texts, images, web sites, and instructional modules. Using the current system, teachers can explore and search the resource library, create and present instructional modules in their classrooms, and communicate with other teachers in the community. This paper discusses the iterative design process and the results of informal usability testing. Lessons learned are also presented for developers. (Also cross-referenced as UMIACS-TR-97-67 and as CLIS-TR-97-12) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Applying DEF/USE Information of Pointer Statements toTraversal-Pattern-Aware Pointer Analysis. Yuan-Shin Hwang. Joel Saltz. July 1997.
Pointer analysis is essential for optimizing and parallelizing compilers. It examines pointer assignment statements and estimates pointer-induced aliases among pointer variables or possible shapes of dynamic recursive data structures. However, previously proposed techniques are not able to gather useful information or have to give up further optimizations when overall recursive data structures appear to be cyclic even though patterns of traversal are linear. The reason is that these proposed techniques perform pointer analysis without the knowledge of traversal patterns of dynamic recursive data structures to be constructed. This paper proposes an approach, {\em traversal-pattern-aware pointer analysis}, that has the ability to first identify the structures specified by traversal patterns of programs from cyclic data structures and then perform analysis on the specified structures. This paper presents an algorithm to perform shape analysis on the structures specified by traversal patterns. The advantage of this approach is that if the specified structures are recognized to be acyclic, parallelization or optimizations can be applied even when overall data structures might be cyclic. The DEF/USE information of pointer statements is used to relate the identified traversal patterns to the pointer statements which build recursive data structures. (Also cross-referenced as UMIACS-TR-97-66) Institute for Advanced Computing, University of Maryland, Department of Computer Science, University of Maryland,
TIKHONOV REGULARIZATION AND TOTAL LEAST SQUARES. Gene H. GOLUB. Per Christian HANSEN. Dianne P. O'LEARY. August 1997.
Discretizations of inverse problems lead to systems of linear equations with a highly ill-conditioned coefficient matrix, and in order to compute stable solutions to these systems it is necessary to apply regularization methods. We show how Tikhonov's regularization method, which in its original formulation involves a least squares problem, can be recast in a total least squares formulation, suited for problems in which both the coefficient matrix and the right-hand side are known only approximately. We analyze the regularizing properties of this method and demonstrate by a numerical example that in certain cases with large perturbations, the new method is superior to standard regularization methods. (Also cross-referenced as UMIACS-TR-97-65) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Efficient Iterative Solution of the Three-Dimensional Helmholtz. Howard C. Elman. Dianne P. O'Leary. August 1997.
We examine preconditioners for the discrete indefinite Helmholtz equation on a three-dimensional box-shaped domain with Sommerfeld-like boundary conditions. The preconditioners are of two types. The first is derived by discretization of a related continuous operator that differs from the original only in its boundary conditions. The second is derived by a block Toeplitz approximation to the discretized problem. The resulting preconditioning matrices allow the use of fast transform methods and differ from the discrete Helmholtz operator by an operator of low rank. We present experimental results demonstrating that when these methods are combined with Krylov subspace iteration, convergence rates depend only mildly on both the wave number and discretization mesh size. In addition, the methods display high efficiencies in an implementation on an IBM SP-2 parallel computer. (Also cross-referenced as UMIACS-TR-97-63) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Verifying Systems with Integer Constraints and Boolean Predicates: A. Tevfik Bultan. Richard Gerber. Christopher League. August 1997.
Symbolic model checking has proved highly successful for large finite-state systems, in which states can be compactly encoded using binary decision diagrams (BDDs) or their variants. The inherent limitation of this approach is that it cannot be applied to systems with an infinite number of states -- even those with a single unbounded integer. Alternatively, we recently proposed a model checker for integer-based systems that uses Presburger constraints as the underlying state representation. While this approach easily verified some subtle, infinite-state concurrency problems, it proved inefficient in its treatment of Boolean and (unordered) enumerated types -- which possess no natural mapping to the Euclidean coordinate space. In this paper we describe a model checker which combines the strengths of both approaches. We use a composite model, in which a formula's valuations are encoded in a mixed BDD-Presburger form, depending on the variables used. We demonstrate our technique's effectiveness on a nontrivial requirements specification, which includes a mixture of Booleans, integers and enumerated types. (Also cross-referenced as UMIACS-TR-97-62) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Temporal accuracy and modern high performance processors: A case study. Krishnan K. Kailas. Bao Trinh. Ashok K. Agrawala. August 1997.
Real-time systems must be able to ensure temporally determinate execution of real-time tasks at run-time. By temporal accuracy, we refer to the timing accuracy with which the execution of a task can be started at a predetermined time. Temporally determinate execution of tasks on modern high performance processors is becoming more and more difficult because of the techniques used by these processors to boost their average performance. This report describes the experiments we have conducted to measure the temporal accuracy that can be achieved with the Pentium Pro processor. We present the results of these experiments and analyze these results to highlight the limitations of temporally determinate execution of programs on modern high performance processor architectures. (Also cross-referenced as UMIACS-TR-97-60) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiler Optimizations for Eliminating Cache Conflict Misses. Gabriel Rivera. Chau-Wen Tseng. July 1997.
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor cache performance in scientific programs, particularly within loop nests. We present two compiler transformations to eliminate conflict misses: 1) modifying variable base addresses, 2) padding inner array dimensions. Unlike compiler transformations that restructure the computation performed by the program, these two techniques modify its data layout. Using cache simulations of a selection of kernels and benchmark programs, we show these compiler transformations can eliminate conflict misses for applications with regular memory access patterns. Cache miss rates for a 16K, direct-mapped cache are reduced by 35% on average for each program. For some programs, execution times on a DEC Alpha can be improved up to 60%. (Also cross-referenced as UMIACS-TR-97-59) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Towards a Theory of Interestingness. Wiktor Marek. V.S. Subrahmanian. August 1997.
There are a wide variety of applications that either require or assume the existence of some underlying definition of ``interestingness.'' However, interests vary from user to user, from situation to situation, and from one time to another. This diversity of interests cannot be captured through a single definition. In this paper, we propose a framework called {\em Full Interestingness Programs} (FIPs) that form a subclass of the Hybrid Knowledge Base Paradigm of Lu, Nerode and Subrahmanian. FIPs may be built ``on top'' of any query language whatsoever. Using FIPs, interests may be easily expressed and captured, and used on an application-specific basis using an application-independent FIP-evaluator. In this paper, we provide a formal semantics for FIPs, as well as techniques for processing requests (queries) to FIPs. (Also cross-referenced as UMIACS-TR-97-57) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Motor Control Model Based on Self-organizing Feature Maps. Yinong Chen. August 1997.
Self-organizing feature maps have become important neural modeling methods over the last several years. These methods have not only shown great potential in application fields such as motor control, pattern recognition, optimization, etc, but have also provided insights into how mammalian brains are organized. Most past work developing self-organizing features maps has focused on systems with a single map that is solely sensory in nature. This research develops and studies a model which has multiple self-organizing feature maps in a closed-loop control system, and that involves motor output as well as proprioceptive and/or visual sensory input. The model is driven by a simulated arm that moves in 3D space. By applying initial activations at randomly selected motor cortex regions, the neural network model spontaneously self-organizes, and demonstrates the appearance of multiple, reasonably stable motor and proprioceptive sensory maps and their interrelationships to each other. These cortical feature maps capture the mechanical constraints imposed by the model arm. They are aligned in a way consistent with a {\em temporal correlation hypothesis}: temporally correlated features usually cause their corresponding cortical map representations to be spatially correlated. Simulations of variations of the motor control model with visual inputs indicates the formation of visual input maps. These maps are also partially aligned with motor output maps, reflecting the degree of temporal correlations during training. The simultaneous presence of proprioceptive input causes the visual input maps to distinguish pairs of antagonist muscles and to be correlated with only one muscle in each pair. Moreover, some theoretical analysis with a simplified model gives insights into the nature of cortical feature maps and sheds light on the driving force behind map correlations. All of these results have provide more understanding about the organization of cortical feature maps, and how these maps might be used to achieve consistent motor commands based on sensory feedback. (Also cross-referenced as UMIACS-TR-97-56) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
High Performance Algorithms for Global BRDF Retrieval. Zengyan Zhang. Satya Kalluri. Joseph Ja'Ja'. Shunlin Liang. Townshend. July 1997.
Most Land cover types are ``anisotropic'', that is, the solar radiation reflected by the surface is not uniform in all directions. Characterizing the Bidirectional Reflectance Distribution Function (BRDF) of the earth's surface is critical in understanding surface anisotropy. Though there are several methods to retrieve the BRDF of various land cover types, most of them have been applied over small data sets collected either on ground or from aircraft at limited spatial and temporal scales. In this paper, we use multi-angular, multi-temporal and multi-band Pathfinder AVHRR Land (PAL) data set to retrieve the global BRDF in the red and near infrared wavelengths. The PAL data set used in our study has a spatial resolution of 8-km and 10-day composite data for four years (1983 to 1986). In particular, we develop high performance algorithms to retrieve global BRDF using three widely different models. Given the volume of data involved (about 27 GBytes), we attempt to optimize the I/O time as well as minimize the overall computational complexity. Our algorithms access the global data once, followed by a redistribution of land pixel data to balance the computational loads among the different nodes of a multiprocessor system. This strategy results in an optimized I/O access time with efficiently balanced computations across the nodes. Experimental data on a 16-node IBM SP2 is used to support these claims and to illustrate the scalability of our algorithms. (Also cross-referenced as UMIACS-TR-97-55) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Dynamic Query Operator Scheduling for Wide-Area Remote Access. Laurent Amsaleg. Michael J. Franklin. Anthony Tomasic. October 1997.
Distributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based query execution model performs poorly. We have developed a class of methods, called query scrambling, for dealing explicitly with the problem of unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. In this paper we focus on the dynamic scheduling of query operators in the context of query scrambling. We explore various choices for dynamic scheduling and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and delayed or possibly bursty arrivals of data. Our performance results show that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS- TR-97-54 Unoversity of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Fast Iterative Image Restoration with a Spatially-Varying PSF. James G. Nagy. Dianne P. O'Leary. June 1997.
We describe how to efficiently apply a spatially-variant blurring operator using linear interpolation of measured point spread functions. Numerical experiments illustrate that substantially better resolution can be obtained at very little additional cost compared to piecewise constant interpolation. (Also cross-referenced as UMIACS-TR-97-53) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Limited-Memory Matrix Methods with Applications. Tamara G. Kolda. April 1997.
The focus of this dissertation is on matrix decompositions that use a limited amount of computer memory, thereby allowing problems with a very large number of variables to be solved. Specifically, we will focus on two applications areas: optimization and information retrieval. We introduce a general algebraic form for the matrix update in limited-memory quasi-Newton methods. Many well-known methods such as limited-memory Broyden Family methods satisfy the general form. We are able to prove several results about methods which satisfy the general form. In particular, we show that the only limited-memory Broyden Family method (using exact line searches) that is guaranteed to terminate within n iterations on an n-dimensional strictly convex quadratic is the limited-memory BFGS method. Furthermore, we are able to introduce several new variations on the limited-memory BFGS method that retain the quadratic termination property. We also have a new result that shows that full-memory Broyden Family methods (using exact line searches) that skip p updates to the quasi-Newton matrix will terminate in no more than n+p steps on an n-dimensional strictly convex quadratic. We propose several new variations on the limited-memory BFGS method and test these on standard test problems. We also introduce and test a new method for a process known as Latent Semantic Indexing (LSI) for information retrieval. The new method replaces the singular value matrix decomposition (SVD) at the heart of LSI with a semi-discrete matrix decomposition (SDD). We show several convergence results for the SDD and compare some strategies for computing it on general matrices. We also compare the SVD-based LSI to the SDD-based LSI and show that the SDD-based method has a faster query computation time and requires significantly less storage. We also propose and test several SDD-updating strategies for adding new documents to the collection. Dept. of Computer Science, Univ. of Maryland,
Designing Dynamic Temporal Controls for Critical Systems. Seonho Choi. Ashok K. Agrawala. Leyuan Shi. May 1997.
Traditional control systems have been designed to exercise control at regularly spaced time instants. When a discrete version of the system dynamics is used, a constant sampling interval is assumed and a new control value is calculated and exercised at each time instant. In this paper, we propose a new control scheme, {\it dynamic temporal control}, in which we not only calculate the control value but also dynamically decide the time instants when the new control computations have to be calculated. Taking a discrete, linear, time-invariant system, and a cost function which reflects a cost for computation of the control values, as an example, we show the feasibility of using this scheme. We implement the dynamic temporal control scheme in a rigid body satellite control example and demonstrate the significant reduction in cost. The scheme proposed here can be implemented using real-time operating system, such as {\em Maruti}, which schedules activities along the time axis. The reduced computations for control permit the use of the same processor for higher level functions resulting in a significant improvement in the performance of the overall system. (Also cross-referenced as UMIACS-TR-97-51) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Requirements of I/O Systems for Parallel Machines: An. Mustafa Uysal. Anurag Acharya. Joel Saltz. May 1997.
I/O-intensive parallel programs have emerged as one of the leading consumers of cycles on parallel machines. This change has been driven by two trends. First, parallel scientific applications are being used to process larger datasets that do not fit in memory. Second, a large number of parallel machines are being used for non-scientific applications. Efficient execution of these applications requires high-performance I/O systems which have been designed to meet their I/O requirements. In this paper, we examine the I/O requirements for data-intensive parallel applications and the implications of these requirements for the design of I/O systems for parallel machines. We attempt to answer the following questions. First, what is the steady-state as well peak I/O rate required? Second, what spatial patterns, if any, occur in the sequence of I/O requests for individual applications? Third, what is the degree of intra-processor and inter-processor locality in I/O accesses? Fourth, does the application structure allow programmers to disclose future I/O requests to the I/O system? Fifth, what patterns, if any, exist in the sequence of inter-arrival times of I/O requests? To address these questions, we have analyzed I/O request traces for a diverse set of I/O-intensive parallel applications. This set includes seven scientific applications and four non-scientific applications. (Also cross-referenced as UMIACS-TR-97-49) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David A. Bader. Joseph Ja'Ja'. May 1997.
SIMPLE: A Methodology for Programming High Performance Algorithms on. We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make efficient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satisfied searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by an ATM switch. (Also cross-referenced as UMIACS-TR-97-48.) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
"Handling Updates and Crashes in VoD Systems". Eenjun Hwang. Kemal Kilic. V.S. Subrahmanian. May 1997.
Though there have been several recent efforts to develop disk based video servers, these approaches have all ignored the topic of updates and disk server crashes. In this paper, we present a priority based model for building video servers that handle two classes of events: user events that could include enter, play, pause, rewind, fast-forward, exit, as well as system events such as insert, delete, server-down, server-up that correspond to uploading new movie blocks onto the disk(s), eliminating existing blocks from the disk(s), and/or experiencing a disk server crash. We will present algorithms to handle such events. Our algorithms are provably correct, and computable in polynomial time. Furthermore, we guarantee that under certain reasonable conditions, continuing clients experience jitter free presentations. We further justify the efficiency of our techniques with a prototype implementation and experimental results. (Also cross-referenced as UMIACS-TR-97-47) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Design and Evaluation of Incremental Data Structures and Algorithms for. Egemen Tanin. Richard Beigel. Ben Shneiderman. May 1996.
Dynamic query interfaces (DQIs) are a recently developed database access mechanism that provides continuous real-time feedback to the user during query formulation. Previous work shows that DQIs are an elegant and powerful interface to small databases. Unfortunately, when applied to large databases, previous DQI algorithms slow to a crawl. We present a new incremental approach to DQI algorithms and display updates that works well with large databases, both in theory and in practice. (Also cross-referenced as UMIACS-TR-97-46 University of Maryland Insttitue for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Viewing personal history records: A comparison of Tabular format and. Diane Lindwarm Alonso. Anne Rose. Catherine Plaisant. Kent L. Norman. May 1997.
Thirty-six participants used a static version of either LifeLines, a graphical interface, or a Tabular representation to answer questions about a database of temporal personal history information. Results suggest that overall the LifeLines representation led to much faster response times, primarily for questions which involved interval comparisons and making intercategorical connections. In addition, on a follow-up questionnaire, nine out of eleven questions rated LifeLines preferable in terms of user satisfaction. A "first impression" test showed that LifeLines can reduce some of the biases of the tabular record summary. A post-experimental memory test led to significantly (p<.004) higher recall for LifeLines. Finally, simple interaction techniques are proposed to augment LifeLines ability to better deal with precise dates, attribute coding and overlaps. Department of Computer Science, University of Maryland,
Scheduling Aperiodic and Sporadic Tasks in Hard Real-Time Systems. Seonho Choi. Ashok K. Agrawala. May 1997.
The stringent timing constraints as well as the functional correctness are essential requirements of hard real-time systems. In such systems, scheduling plays a very important role in satisfying these constraints. The priority based scheduling schemes have been used commonly due to the simplicity of the scheduling algorithm. However, in the presence of task interdependencies and complex timing constraints, such scheduling schemes may not be appropriate due to the lack of an efficient mechanism to schedule them and to carry out the schedulability analysis. In contrast, the time based scheduling scheme may be used to schedule a set of tasks with greater degree of schedulability achieved at a cost of higher complexity of off-line scheduling. One of the drawbacks of currently available scheduling schemes, however, is known to be their inflexibility in dynamic environments where dynamic processes exist, such as aperiodic and sporadic processes. We develop and analyze scheduling schemes which efficiently provide the flexibility required in real-time systems for scheduling processes arriving dynamically. This enables static hard periodic processes and dynamic processes(aperiodic or sporadic) to be jointly scheduled. (Also cross-referenced as UMIACS-TR-97-44) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Apparency of Contingencies in Pull Down Menus. D. L. Alonso. K. L. Norman. May 1997.
In many computer interfaces the underlying structures and contingencies are often hidden from the user's view. Users high in Spatial Visualization Ability (SVA) are able to quickly determine and manage the contingencies of these relationships and are not severely affected by this problem. Low SVA users, however, have difficulty visualizing these contingencies and often get lost. We examined the performance of 160 undergraduate students to determine whether revealing hidden contingencies through visual cues would facilitate low SVA users enabling them to approach the level of performance of high SVA users on a computerized path-finding task. It was found that using color and displaying paths improved performance, however, there is no indication that it is more beneficial to low than high SVA users. (Also cross-referenced as UMIACS-TR-97-43) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Interface and Data Architecture for Query Preview in Networked. Khoa Doan. Catherine Plaisant. Ben Shneiderman. Tom Bruns. October 1997.
There are numerous problems associated with formulating queries on networked information systems. These include data diversity, data complexity, network growth, varied user base, and slow network access. This paper proposes a new approach to a network query user interface which consists of two phases: query preview and query Rrefinement. This new approach is based on the concepts of dynamic queries and query previews, which guides users in rapidly and dynamically eliminating undesired datasets, reducing the data volume to manageable size, and refining queries locally before submission over a network. Examples of 2 applications are given: a Restaurant Finder and prototype with NASA's Earth Observing Systems--Data Information Systems (EOSDIS). Data architecture is discussed and user's feedback is presented. Dynamic queries and query previews provide solutions to many existing problems in querying networked information systems. Department of Computer Science, University of Maryland,
Visualizing websites using a hierarchical table of contents browser:. David A. Nation. Catherine Plaisant. Gary Marchionini. Anita Komlodi. May 1997.
A method is described for visualizing the contents of a Web site with a hierarchical table of contents using a Java program and applet called WebTOC. The automatically generated expand/contract table of contents provides graphical information indicating the number of elements in branches of the hierarchy as well as individual and cumulative sizes. Color can be used to represent another attribute such as file type and provide a rich overview of the site for users and managers of the site. Early results from user studies suggest that WebTOC is easily learned and can assist users in navigating websites. (Also cross-referenced as UMIACS-TR-97-41) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
A Study on Video Browsing Strategies. Wei Ding. Gary Marchionini. August 1997.
Due to the unique characteristics of video, traditional surrogates and control/browsing mechanisms that facilitate text-based information retrieval may not work sufficiently for video. In this paper, a video browsing interface prototype with key frames and fast play-back mechanisms was built and tested. Subjects performed two kinds of browsing-related tasks: object identification and video comprehension under different display speeds (1 fps, 4 fps, 8 fps, 12 fps and 16 fps). It was found that browsing the key frames between 8 to 12 fps could potentially define a functional limit in object identificationaccuracy. There was no significant performance difference found across display speeds tested. The results also showed that lower speeds were required for object identification than for video comprehension. How user performance was affected by individual characteristics such as age, gender, academic background and TV- or movie-watching habits, was investigated, but no significant difference was found due to the limit of sample size and other constraints. (Also cross-referenced as UMIACS-TR-97-40) (Also cross-referenced as CLIS-TR-97-06) University of Maryland Institute for Advanced Computer Studies, Univ. of Maryland Human-Computer Interaction Laboratory, Univ. of Maryland College of Library and Information Services,
Elastic Windows: A Hierarchical Multi-Window World-Wide Web Browser. Eser Kandogan. Ben Shneiderman. May 1997.
The World-Wide Web (WWW) is becoming an invaluable source for the information needs of many users. However, current browsers are still primitive, in that they do not support many of the navigation needs of users, as indicated by user studies. They do not provide an overview and a sense of location in the information structure being browsed. Also they do not facilitate the organization and filtering of information nor aid users in accessing already visited pages without much cognitive demands. In this paper, a new browsing interface is proposed with multiple hierarchical windows and efficient multiple window operations. It provides a flexible organization where users can quickly organize, filter, and restructure the information on the screen as they reformulate their goals. Overviews can give the user a sense of location in the browsing history as well as provide fast access to a hierarchy of pages. Department of Computer Science, University of Maryland,
Content + Connectivity = Community: Digital Resources for a Learning. Gary Marchionini. Victor Nolet. Hunter Williams. Wei Ding. Josephus Beale Jr.. Anne Rose. Allison Gordon. Ernestine Enomoto. Lynn Harbinson. January 1997.
Digital libraries offer new opportunities to provide access to diverse resources beyond those held in school buildings and to allow teachers and learners to reach beyond classroom walls to other people to build distributed learning communities. Creating learning communities requires that teachers change their behaviors and the Baltimore Learning Community Project described here is based on the premise that access to resources should be tied to the assessment outcomes that increasingly drive curricula and classroom activity. Based on examination of curriculum guides and discussions with project teachers, an interface for the BLC digital library was prototyped. Three components (explore, construct, and present) of this user interface that allows teachers to find text, video, images, web sites, and instructional modules and create their own modules are described. Although the technological challenges of building learning communities are significant, the greater challenges are mainly social and political. Department of Computer Science, University of Maryland,
User Interfaces for a Complex Robotic Task: A Comparison of Tiled vs.. J.Corde Lane. Steven P. Kuester. Ben Shneiderman. January 1997.
High complexity tasks, such as remote teleoperation of robotic vehicles, often require multiple windows. For these complex tasks, the windows necessary for task completion, may occupy more area than available on a single visual display unit (VDU). Since the focus of the robotic task constantly changes, modular control panels that can be opened, closed, and moved on the screen are invaluable to the operator. This study describes a specific robotic task and the need for a multi-window interface that can be easily manipulated. This paper examines two multi-window management strategies: tiled (fixed size) and arbitrary overlap. Multi-window searches were performed using the two management styles and they were compared on the basis of search completion time and error rates. Results with 35 novice users showed faster completion times for the tiled management strategy than for the arbitrary overlap strategy. Other factors such as the number of windows available, the number of displayed windows, workload of opening or closing windows, and effect of learning are discussed. Department of Computer Science, University of Maryland,
Evaluating Multilingual Gisting of Web Pages. Philip Resnik. March 1997.
We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form. (Also cross-referenced as UMIACS-TR-97-39) University of Maryland Institute for Advanced Computer Studies, Dept. of Linguistics, University of Maryland,
Development of an Object Oriented Parser/Generator, Ontologies, and. Bonnie J. Dorr. February 1996.
This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. Our primary objective was to develop an interlingual representation based on lexical conceptual structure (LCS) and to examine the relation between this representation and a set of linguistically motivated semantic classes. We have focused on several areas in support of our objectives: (1) updating a Korean message-passing parser to handle more Korean linguistic phenomena and porting this to Windows on the PC so that it runs with LCS composition; (2) scaling up the Korean lexicon to include thousands of new words converted by the Yale-romanization program, to be integrated with the Korean message-passing parser; (3) investigation of the syntax-semantics relation and use of this relation in automatic classification of verbs; (4) investigation of the aspectual dimensions as it impacts lexical semantics and the lexical choice process in multilingual generation; and (5) automatic construction of LCS's from lexical-semantic templates and thematic grids. (Also cross-referenced as UMIACS-TR-97-37) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Speech-Based Information Retrieval for Digital Libraries. Douglas W. Oard. March 1997.
Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries. Presently, access to most of this material is provided using a combination of manually annotated metadata and linear search. Recent advances in speech processing technology have produced a number of techniques for extracting features from recorded speech that could provide a useful basis for the retrieval of speech or multimedia objects in large digital library collections. Among these features are the semantic content of the speech, the identity of the speaker, and the language in which the speech was spoken. We propose to develop a graphical and auditory user interface for speech-based information retrieval that exploits these features to facilitate selection of recorded speech and multimedia information objects that include recorded speech. We plan to use that interface to evaluate the effectiveness and usability of alternative ways of exploiting those features and as a testbed for the evaluation of advanced retrieval techniques such as cross-language speech retrieval. (Also cross-referenced as UMIACS-TR-97-36) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
The Virtual Microscope. Renato Ferreira. Bongki Moon. Jim Humphries. Alan Sussman. Joel Saltz. Robert Miller. Angelo Demarzo. April 1997.
We present the design of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. We discuss several technical challenges related to providing the performance necessary to achieve rapid response time, mainly in dealing with the enormous amounts of data (tens to hundreds of gigabytes per slide) that must be retrieved from secondary storage and processed. To effectively implement the data server, the system design relies on the computational power and high I/O throughput available from an appropriately configured parallel computer. (Also cross-referenced as UMIACS-TR-97-35) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
CAUCHY-LIKE PRECONDITIONERS FOR 2-DIMENSIONAL ILL-POSED PROBLEMS. Misha E. Kilmer. March 1997.
Ill-conditioned matrices with block Toeplitz, Toeplitz block (BTTB) structure arise from the discretization of certain ill-posed problems in signal and image processing. We use a preconditioned conjugate gradient algorithm to compute a regularized solution to this linear system given noisy data. Our preconditioner is a Cauchy-like block diagonal approximation to an orthogonal transformation of the BTTB matrix. We show the preconditioner has desirable properties when the kernel of the ill-posed problem is smooth: the largest singular values of the preconditioned matrix are clustered around one, the smallest singular values remain small, and the subspaces corresponding to the largest and smallest singular values, respectively, remain unmixed. For a system involving $np$ variables, the preconditioned algorithm costs only $O(np (\lg n + \lg p))$ operations per iteration. We demonstrate the effectiveness of the preconditioner on three examples. University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On Hyperbolic Triangularization. M. Stewart. G.W. Stewart. May 1997.
This paper treats the problem of triangularizing a matrix by hyperbolic Householder transformations. The stability of this method, which finds application in block updating and fast algorithms for Toeplitz-like matrices, has been analyzed only in special cases. Here we give a general analysis which shows that two distinct implementations of the individual transformations are relationally stable. The analysis also shows that pivoting is required for the entire triangularization algorithm to be stable. (Also cross-referenced as UMIACS-TR-97034) University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Domain-Driven Reconfiguration in Collaborative Virtual Environments. Donald J. Welch. James M. Purtilo. March 1997.
When virtual environments (VE) collaborate to create a shared virtual world, events occur that can have catastrophic effects on that virtual world. These events can be system events, such as the loss of a host or a network link to that host. They can also be events that happen only in the virtual world, for example, a virtual activity that migrates, bringing increased activity to a different VE. To maintain acceptable or realistic behavior can require the restructuring of the collaborative virtual environment (CVE) during execution. The restructuring must take place in accordance with a set of rules mandated by the domain and specific application. The reconfiguration must occur quickly, to maintain realism for the users. Automatic restructuring brings the added benefit of fewer support staff. We call the automatic restructuring of a distributed application with respect to these rules Domain-Driven Reconfiguration and we have developed a software engineering environment to support its inclusion in CVEs. (Also cross-referenced as UMIACS-TR-97-32) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Dynamic Dispatching of Cyclic Real-Time Tasks with Relative Constraints. Seonho Choi. Ashok K. Agrawala. March 1997.
In some hard real-time systems, relative timing constraints may be imposed on task executions, in addition to the release time and deadline constraints. A periodic task may have jitter constraints between the start or finish times of any two consecutive executions. Relative constraints such as separation or relative deadline constraints may be given between start or finish times of tasks (4). One approach is to find a total order on a set of n jobs in a scheduling window, and cyclically use this order at run time to execute the jobs. However, in the presence of the relative constraints, if the job execution times are nondeterminiistic with defined lower and upper bound, it is not always possible to statically assign start times at pre-runtime without sacrificing the schedulability(4). We develop a technique called dynamic cyclic dispatching to enforce relative constraints along with release time and deadline constraints. An ordered set of N jobs is assumed to be given within a scheduling window and this schedule (ordering) is cyclically repeated at runtime. An off-line algorithm is presented to check the schedulability of the job set and to obtain parametric lower and upper bounds on the start times of jobs, if the job set is schedulable. Then, these parametric bounds are evaluated at runtime to obtain a valid time intervals during which jobs can be started. The complexity of this off-line component is shown to be O(n2N3) where n is the number of jobs in a scheduling window that have relative constraints with jobs in the next scheduling window. An online algorithm can evaluate these bounds in O(N3+n5) computation time. Unlike static approached which assign fixed start times to jobs in the scheduling window, our approach not only allows us to flexibly manage the slack times with the schedulability of a task set not affected, but also yields a guaranteed schedulability in the sense that, if other dispatching mechanism can schedule the job sequences satisfying all given constraints, then our mechanism can also schedule them. (Also cross-referenced as UMIACS-TR-97-300 University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
An Accurate Time-Management Unit for Real-Time Processors. Krishnan K. Kailas. Ashok K. Agrawala. March 1997.
Time management is an important aspect of real-time computation. Traditional high performance processors provide little or no support for management of time. In this report, we propose a time-management unit which can greatly help improve the performance of a real-time system. The proposed unit can be added to any processor architecture without affecting its performance. We also explain how the unit helps to solve the clock synchronization problems in a real-time network. (Also cross-referenced as UMIACS-TR-97-28) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Estimating End-to-End Cell Delay Variation in ATM Networks. Ibrahim Korpeoglu. Satish K. Tripathi. Xiaoqiang Chen. March 1997.
Cell delay variation (CDV) is one of the quality of service parameters that can be negotiated between applications and an ATM network. The network should check during connection setup, as part of call admission control, whether it can satisfy the requested CDV of an application. For this comparison, the network should estimate the end-to-end CDV that it can support, by using local information about cell delays and delay variations in switches. An accurate estimation of the end-to-end CDV is important for decreasing call-blocking probability and increasing network utilization. In this article, we will first describe, evaluate, and identify the short-comings of three proposed methods for end-to-end CDV estimation. Then we will present a new method based on Chernoff bound and compare it to the other methods. The Chernoff method is promising since it has good accuracy and applicability under current signalling support for ATM networks. (Also cross-referenced as UMIACS-TR-97-27) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Designing Access Methods for Bitemporal Databases. Anil Kumar. Vassilis J. Tsotras. Christos Faloutsos. March 1997.
By supporting the valid and transaction time dimensions, bitemporal databases represent reality more accurately than conventional databases. In this paper we examine the issues involved in designing efficient access methods for bitemporal databases and propose the partial-persistence and the double-tree methodologies. The partial- persistence methodology reduces bitemporal queries to partial persistence problems for which an efficient access method is then designed. The double-tree methodology "sees" each bitemporal data object as consisting of two intervals (a valid-time and a transaction- time interval), and divides objects into two categories according to whether the right endpoint of the transaction time interval is already known. A common characteristic of both methodologies is that they take into account the properties of each time dimension. Their performance is compared with a straightforward approach that "sees" the intervals associated with a bitemporal object as composing one rectangle which is stored in a single multidimensional access method. Given that some limited additional space is available, our experimental results show that the partial- persistence methodology provides the best overall performance, especially for transaction timeslice queries. For those applications that require ready, off-the-shelf, access methods the double-tree methodology is a good alternative. (Also cross-referenced as UMIACS-TR-97-24) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Aspectual Modifications to a LCS Database for NLP Applications. Bonnie J. Dorr. Mari Broman Olsen. May 1997.
Verbal and compositional lexical aspect provide the underlying temporal structure of events. Knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (Dowty, 1986: Moens and Steedman, 1988; Passoneau, 1988), interfacing to temporal databases (Androutsopoulos, 1996), processing temporal modifiers (Antonisse, 1994), describing allowable alternations and their semantic effects (Resnik, 1996; Tenny, 1994), and selecting tense and lexical items for natural language generation ((Dorr and Olsen, 1996; Klavans and Chodorow, 1992), cf. (Slobin and Bocaz, 1988)). We show that it is possible to represent lexical aspect---both verbal and compositional---on a large scale, using Lexical Conceptual Structure (LCS) representations of verbs in the classes cataloged by Levin (1993). We show how proper consideration of these universal pieces of verb meaning may be used to refine lexical representations and derive a range of meanings from combinations of LCS representations. A single algorithm may therefore be used to determine lexical aspect classes and features at both verbal and sentence levels. Finally, we illustrate how knowledge of lexical aspect facilitates the interpretation of events in NLP applications. (Also cross-referenced as UMIACS-TR-97-21) (Also cross-referenced as LAMP-TR-007) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adam A. Porter. Harvey Siy. Audris Mockus. Lawrence G. Votta. Understanding the Sources of Variation in Software Inspections. January 1997.
In a previous experiment, we determined how various changes in three structural elements of the software inspection process (team size, and number and sequencing of session), altered effectiveness and interval. our results showed that such changes did not significantly influence the defect detection reate, but that certain combinations of changes dramatically increased the inspection interval. We also observed a large amount of unexplained variance in the data, indicating that other factors much be affecting inspection performance. The nature and extent of these other factos now have to be determined to ensure that they had not biased our earlier results. Also, identifying these other factors might suggest additional ways to improve the efficiency of inspection. Acting on the hypothesis that the "inputs" into the inspection process (reviewers, authors, and code units) were significant sources of variation, we modeled their effects on inspection performance. We found that they were responsible for much more variation in defect detection than was process structure. This leads us to conclude that better defect detection techniques, not better process structures, at the key to improving inspection effectiveness. The combined effects of process inputs and process structure on the inspection interval accounted for only a small percentage of the variance in inspection interval. Therefore, there still remain other factors which need to be identified. (Also cross-referenced as UMIACS-TR-97-22) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Bell Laboratories, Naperville, IL,
Adam A. Porter. Fundamental Laws and Assumptions of Software Maintenance. March 1997.
Researchers must pay far more attention to discovering and validating the principles that underlie software maintenance and evolution. This was one of the major conclusions reached during the International Workshop on Empirical Studies of Software Maintenance. This workship, held in November 1996 in Monterey, California, brought together an international group of researchers to discuss the successes, challenges and open issues in software maintenance and evolution. This article documents the discussion of the subgroup on fundamental laws and assumption of software maintenance. The participants of this group in included researchers in software engineering, the behavioral sciences, information systems and statistics. Their main conclusion was that insufficient effort has been paid to synthesizing research conjectures into validated theories and this problem has slowed progress in software maintenance. To help remedy this situation they made the following recommendations: (1) when we use empirical methods, an explicit goal should be to develop theories, (2) we should look to other disciplines for help where it is appropriate, and (3) our studies should use a wider range of empirical methods (Also cross-referenced as UMIACS-TR-97-21) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adam A. Porter. C. A. Toman. Harvey Siy. Lawrence G. Votta. An Experiment to Assess the Cost-Benefits of Code Inspections in Large. March 1997.
We conducted a long-term experiment to compare the costs and benefits of several different software inspection methods. These methods were applied by professional developers to a commercial software product they were creating. Because the laboratory for this experiment was a live development effort, we took special care to minimize cost and risk to the project, while maximizing our ability to gather useful data. This article has several goals: (1) to describe the experiment's design and show how we used simulation techniques to optimize it, (2) to present our results and discuss their implications for both software practitioners and researchers, and (3) to discuss several new questions raised by our findings. For each inspection we randomly assigned 3 independent variables: (1) the number of reviewers on each inspection team (1,2, or 4), (2) the number of teams inspection the code unit (1 or 2), and (3) the requirement that defects be repaired between the first and second team's inspections. The reviewers for arch inspection were randomly selected without replacement from a pool of 11 experienced software developers. The dependent variable for each inspection included inspection interval (elapsed time), total effort, and the defect detection rate. Our results are based on the observation of 88 inspection s and challenge certain long-held beliefs about the most cost-effective ways to conduct inspections and raise some questions about the benefits of recently proposed methods. (Also cross-referenced as UMIACS-TR-97-20) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, AT&T Bell Laboratories, Naperville IL,
Understanding the Effects of Developer Activities on Inspection. Adam A. Porter. Harvey Siy. Lawrence G. Votta. March 1997.
We have conducted an industrial experiment to assess the cost-benefit tradeoffs of several software inspection processes. Our results to date explain the variation in observed effectiveness very well, but are unable to satisfactorily explain variation in inspection interval. In this article we examine the effect of a new factor - process environment - on inspection interval (calendar time needed to complete the inspection). Our analysis suggests that process environment does indeed influence inspection interval. in particular, we found that non-uniform work priorities, time-varying workloads, and deadlines have significant effects. Moreover, these experiences suggest that regression models are inherently inadequate for interval modeling, and that queueing models may be more effective. (Also cross-referenced as UMIACS-TR-97-19) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Specification-based Testing of Reactive Software: Tools and Experiments. Lalita Jategaonkar Jangadeesan. Adam A. Porter. Carlos Puchol. J. Christopher Ramming. Lawrence G. Votta. March 1997.
Testing commercial software is expensive and time consuming. Automated testing methods promise to save a great deal of time and money throughout the software industry. One approach that is well-suited for the reactive systems found in telephone switching systems is specification-based testing. We have built a set of tools to automatically test software applications for biolations of safety properties expressed in temporal logic. out testing system automatically constructs finite state machine oracles corresponding to safety properties, builds test harnesses, and integrates them with the application. The test harness hen generates inputs automatically to test the application. We describe a study examining the feasibility of this approach for testing industrial applications. To conduct this study we formally modeled an Automatic Protection Switching system (APS), which is an application common to many telephony systems. We then asked a number of computer science graduate students to develop several versions of the APS and use our tools to test them. We found that the tools are very effective, save significant amounts of human effort (at the expense of machine resources), and are easy to use. We also discuss improvements that are needed before we can use the tools with professional developer building commercial products. (Also cross-referenced as UMIACS-TR-97-18) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Bell Laboratories, Naperville IL, Dept. of Computer Sciences, Univ. of Texas at Austin, AT&T Laboratories,
Anywhere, Anytime Code Inspections: Using the Web to Remove Inspection. James Perpich,. Dewayne E. Perry. Adam A. Porter. Lawrence G. Votta. Michael W. Wade. March 1997.
The dissemination of critical information and the synchronization of coordinated activities are critical problems in geographically separated, large-scale, software development. While these problems are not insurmountable, their solutions have varying trade-offs in terms of time, cost and effectiveness. Out previous studies have shown that the inspection interval is typically lengthened because of schedule conflicts among inspectors which delay the (usually) required inspection collection meeting. We present and justify a solution using an intranet web that is both timely in its dissemination of information and effective in its coordination of distributed inspectors. First, exploiting a naturally occurring experiment (reported here), we conclude that the asynchronous collection of inspection results is at least as effective as the synchronous collection of those results. Second, exploiting the information dissemination qualities and the on-demand nature of information retrieval of the web, and the platform independence of browsers, e build an inexpensive tool that integrates seamlessly into the current development process. By seamless we man an identical paper flow that results in an almost identical inspection process. The acceptance of the inspection tool has been excellent. The cost savings just from the reduction in paper work and the time savings from the reduction in distribution interval of the inspection package (sometimes involving international mailings) have been substantial. These savings together with the seamless integration into the existing environment are the major factors for this acceptance. From our viewpoint as experimentalists, the acceptance came too readily. Therefor we lost our opportunity to explore this tool using a series of controlled experiments to isolate the underlying factors or its effectiveness. Nevertheless, by using historical data we can show that the new process is less expensive in terms of cost and at least as effective in terms of quality (defect detection effectiveness). (Also cross-referenced as UMIACS-TR-97-17) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Lucent Technologys Inc, Naperville IL and Murray Hill NJ, Bell Laboratories, Murray Hill NJ,
Specification-based Testing of Reactive Software: A Case Study in. Lalita Jangadeesan. Adam A. Porter. Carlos Puchol. J. Christopher Ramming. Lawrence G. Votta. February 1997.
We describe a case study in which we tried to transfer a specification-based testing system from research to practice. We did the case study in two steps: First we conducted a feasibility study in a laboratory setting to estimate the potential costs and benefits of using the system. Next we conducted a usability study, in an industrial setting, to determine whether it would be effective in practice. The case study illustrates that technology transfer efforts can benefit from a greater focus on practitioners' needs, and that this focus helps identify some of the open problems that limit formal methods technology transfer. We also found that there is often a tension between the scope of the problem to be solved and the specificity of the solution. The greater the scope of the problem, the more general the formal method solution and, thus, the more customization that must be done to use it in a particular environment. We suggest that researchers limit the scope of the problems they try to solve to minimize the risk of technology transfer failure. (Also cross-referenced as UMIACS-TR-97-16) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Assessing Software Review Meetings: Results of a Comparative Analysis. Adam A. Porter. Philip M. Johnson. February 1997.
Software review is a fundamental tool for software quality assurance. Nevertheless, there are significant controversies as to the most efficient and effective review method. One of the most important questions currently being debated is the utility of meetings. Although almost all industrial review methods are centered around the inspection meeting, recent findings call their value into question. To gain insight into these issues, the two authors of this paper separately and independently conducted controlled experimental studies. This paper discusses a joint effort to understand the broader implications of these tow studies. To do this, we designed and carried out a process of "reconciliation" in which we established a common framework for the comparison of the two experimental studies, re-analyzed to experimental data with respect to this common framework, and compared the results. Through this process we found many striking similarities between the the results of the two studies, strengthening their individual conclusions. it also revealed interesting differences between the two experiments, suggesting important avenues for future research. (Also cross-referenced as UMIACS-TR-97-15) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Quantifiable Data Mining Using Principal Component Analysis. Flip Korn. Alexandros Labrinidis. Yannis Kotidis. Christos Faloutsos. Alex Kaplunovich. Dejan Perkovic. February 1997.
Association Rule Mining algorithms operate on a data matrix (e.g., customers x products) to derive rules. We propose a single-pass algorithm for mining linear rules in such a matrix based on Principal Component Analysis. PCA detects correlated columns of the matrix, which correspond to, e.g., products that sell together. The first contribution of this work is that we propose to quantify the ``goodness'' of a set of discovered rules. We define the ``guessing error'': the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. The second contribution is a novel method to guess missing/hidden values from the linear rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can ``guess'' the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, `what-if' scenarios, outlier detection, and visualization. Moreover, we show that we can compute the principal components with a single pass over the dataset. Experiments on real datasets (e.g., NBA statistics) demonstrate that the proposed method consistently achieves a ``guessing error'' of up to 5 times lower than the straightforward competitor. (Also cross-referenced as UMIACS-TR-97-13) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Temporally Determinate Disk Access: An Experimental Approach. Mohamed Aboutabl. Ashok K. Agrawala. Jean-Dominique Decotignie. February 1997.
Disk drives are the most commonly used secondary storage devices in computer systems. The way operating systems access these devices leads to a wide range of variability in access time. In this report we study the detailed temporal characteristics of disk drives. We describe a comprehensive set of experiments designed to build a model for the disk drive. Simulation is used to validate the model. This disk model will help design a device driver which can achieve a high degree of temporal determinacy. (Also cross-referenced as UMIACS-TR-97-14) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, D'epartment d'Informatique, Ecole Polytechnique Federale De Lausanne,
Representing and Integrating Multiple Calendars. Sarit Kraus. Yehoshua Sagiv. V. S. Subrahmanian. February 1997.
Whenever humans refer to time, they do so with respect to a specific underlying calendar. So do most software applications. However, most theoretical models of time refer to time with respect to the integers (or reals). Thus, there is a mismatch between the theory and the application of temporal reasoning. To lessen this gap, we propose a formal, theoretical definition of a calendar and show how one may specify dates, time points, time intervals, as well as sets of time points, in terms of constraints with respect to a given calendar. Furthermore, when multiple applications using different calendars wish to work together, there is a need to integrate those calendars together into a single, unified calendar. We show how this can be done. (Also cross-referenced as UMIACS-TR-97-12) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Mathematics and Computer Science, Bar-Ilan University, Israel, Dept. of Computer Science, Hebrew University, Israel,
ForMAT and Parka: A technology integration experiment and beyond. James Hendler. K. Stoffel. D. Rager. A. Mulvehill. February 1997.
This report describes a Technology Integration Experiment (TIE) between the University of Maryland and MITRE Corp. which was undertaken as part of the (D)Arpa/Rome Laboratory Planning Initiative (ARPI). This work led to an integration of the UM Parka-DB tool into the MITRE ForMAT transportation planning tool. This work also forms one of the cornerstones of the "Case-based Planning" cluster of the current phase of the ARPI. Dept. of Computer Science, Univ. of Maryland,
Multi-platform Simulation of Video Playout Performance. Ladan Gharai. Richard Gerber. February 1997.
We describe a video playout and simulation package, including (1) a multi-threaded player, which maximizes performance via asynchronous streaming and selective IO-prefetching; (2) a compositional simulator, which predicts playout performance for multiple platforms via eleven key deterministic and stochastic time-generating functions; and (3) a set of profiling tools, which allows one to extend the rang of target platforms by benchmarking new components, and converting the results into distribution functions that the simulator can access. Using this system, a developer can quickly estimate a video's performance on a wide spectrum of target platforms - without ever having to actually assemble them. (Also cross-referenced as UMIACS-TR-97-11) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Automated Computation of Decomposable Synchronization Conditions. Gilberto Matos. James M. Purtilo. Elizabeth White. February 1997.
The most important aspect of concurrent and distributed computation is the interaction between system components. Integration of components into a system requires some synchronization that prevents the components from interacting in ways that may endanger the system users, its correctness or performance. The undesirable interactions are usually described using temporal logic, or safety and liveness assertions. Automated synthesis of synchronization conditions is a portable alternative to the manual design of system synchronization, and it is already widespread in the hardware CAD domain. The automated synchronization for concurrent software systems is hindered by their excessive complexity, because their state spaces can rarely be exhaustively analyzed to compute the synchronization conditions. The analysis of global state spaces is required for liveness and real--time properties, but simple safety rules depend only on the referenced components and not on the rest of the system or its environment. Synchronization conditions for delayable safety critical systems can be computed without the state space analysis, and decomposed into single component synchronization conditions. Automated synthesis of decomposable synchronization conditions provides a solid groundwork for the independent design of system components, and supports reuse and maintenance in concurrent software systems. This approach to integration of concurrent systems is embodied by GenEx, an analysis and synchronization tool that integrates system components to satisfy a given set of safety rules, and produces executable systems. (Also cross-referenced as UMIACS-TR-97-10) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Computer Science, George Mason University,
LEXICALL: Lexicon Construction for Foreign Language Tutoring. Bonnie J. Dorr. February 1997.
We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the definition of a relation between broad semantic classes and LCS meaning components. Our acquisition program---LEXICALL---takes, as input, the result of previous work on verb classification and thematic grid tagging, and outputs LCS representations for different languages. These representations have been ported into English, Arabic and Spanish lexicons, each containing approximately 9000 verbs. We are currently using these lexicons in an operational foreign language tutoring and machine translation. (Also cross-referenced as UMIACS-TR-97-09) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Resource Lock Commit Protocol (RLCP) for Multimedia Object Retrieval. K. Selcuk Candan. Eenjun Hwang. B. Prabhakaran. V.S. Subrahmanian. February 1997.
Many multimedia presentation applications involve retrieval of objects from more than one collaborating server. Presentations of objects from different collaborating servers might be inter-dependent. For instance, we can consider distributed video servers where blocks of movies are distributed over a set of servers. Here, blocks of a movie from different video servers have to be retrieved and presented continuously without any gaps in the presentation. Such applications first need an estimate of the available network resources to each of the collaborating server in order to identify a schedule for retrieving the objects composing the presentation. A collaborating server can suggest modifications of the retrieval schedule depending on its load. These modifications can potentially affect the retrieval schedule for other collaborating applications. Hence, a sequence of negotiations have to be carried out with the collaborating servers in order to commit for a retrieval schedule of the objects composing the multimedia presentation. In this paper, we propose an application sub-layer protocol, Resource Lock Commit Protocol (RLCP), for handling the negotiation and commitment of the resources required for a collaborative multimedia presentation application. (Also cross-referenced as UMIACS-TR-97-08) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Flexible Meta-Wrapper Interface for Autonomous Distributed. Louiqa Raschid. Maria Esther Vidal. Jean-Robert Gruser. March 1997.
We support flexible query processing with autonomous networked information sources. Flexibility allows a query to be accepted in a dynamic environment with unavailable sources. Flexibility provides the ability to identify equivalent sources, based on their contents; these equivalences are used to eliminate redundancy and provide alternate query plans, when some source is unavailable. We determine the best plan, i.e., the least-cost non-redundant plan, based on a cost-model for autonomous sources. These features are supported by a meta-wrapper component within the mediator. The meta-wrapper interface is defined by a structure and supported operations. WHOQL is a query language for queries and plans; it can represent sequential execution to obtain safe plans, and plans with redundancy (alternatives). A language WHODL defines the mapping from the meta-wrapper interface to each source. WHODL also describes the contents of a source. This content definition is used to determine equivalences of autonomous sources. We obtain a least-cost non-redundant plan in a dynamic environment. A meta-wrapper cost model uses three underlying sources of information: a selectivity model; a cost model for operators in the meta-wrapper; and a cost estimator for the query response time. The estimator uses a parameterized feedback technique to learn from query feedback, and to determine the relevance of various factors that affect response time. The cost model also provides feedback to the plan generator on low-cost plans. (Also cross-referenced as UMIACS-TR-97-07) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compile-Time Analysis on Programs with Dynamic Pointer-Linked Data. Yuan-Shin Hwang. Joel Saltz. November 1996.
This paper studies static analysis on programs that create and traverse dynamic pointer-linked data structures. It introduces a new type of auxiliary structures, called {\em link graphs}, to depict the alias information of pointers and connection relationships of dynamic pointer-linked data structures. The link graphs can be used by compilers to detect side effects, to identify the patterns of traversal, and to gather the DEF-USE information of dynamic pointer-linked data structures. The results of the above compile-time analysis are essential for parallelization and optimizations on communication and synchronization overheads. Algorithms that perform compile-time analysis on side effects and DEF-USE information using link graphs will be proposed. Dept. of Computer Science, Univ. of Maryland,
Real-time Communication. Ardas Cilingiroglu. Sung Lee. Ashok K. Agrawala. January 1997.
Recent advances in networking technology has enabled new multimedia and process control applications. These applications require real-time communication services with stringent performance guarantees expressed in terms of delay, delay jitter, throughput and loss rate. Current network architectures and protocols are designed to support best-effort services and they are inefficient in supporting real-time services. In this paper, we survey real-time communication architectures and protocols both in packet-switching networks and in multiple-access networks. For each network a service model is presented as a general framework. Specifically, the service model for a packet-switching network is composed of a specification for traffic characterization and performance requirements, a routing protocol, a resource reservation protocol and a packet service discipline at switching nodes. The model for a multiple-access network, on the other hand, includes a basic traffic characterization and a MAC-layer real-time scheduling algorithm. This paper surveys the recent developments in each component of the service models with comparisons of alternative techniques. (Also cross-referenced as UMIACS-TR-97-04) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Alternative Discrete-Time Operators and Their Application to Nonlinear. Andrew D. Back. Ah Chung Tsoi. Bill G. Horne. C. Lee Giles. January 1997.
The shift operator, defined as q x(t) = x(t+1), is the basis for almost all discrete-time models. It has been shown however, that linear models based on the shift operator suffer problems when used to model lightly-damped-low-frequency (LDLF) systems, with poles near $(1,0)$ on the unit circle in the complex plane. This problem occurs under fast sampling conditions. As the sampling rate increases, coefficient sensitivity and round-off noise become a problem as the difference between successive sampled inputs becomes smaller and smaller. The resulting coefficients of the model approach the coefficients obtained in a binomial expansion, regardless of the underlying continuous-time system. This implies that for a given finite wordlength, severe inaccuracies may result. Wordlengths for the coefficients may also need to be made longer to accommodate models which have low frequency characteristics, corresponding to poles in the neighbourhood of (1,0). These problems also arise in neural network models which comprise of linear parts and nonlinear neural activation functions. Various alternative discrete-time operators can be introduced which offer numerical computational advantages over the conventional shift operator. The alternative discrete-time operators have been proposed independently of each other in the fields of digital filtering, adaptive control and neural networks. These include the delta, rho, gamma and bilinear operators. In this paper we first review these operators and examine some of their properties. An analysis of the TDNN and FIR MLP network structures is given which shows their susceptibility to parameter sensitivity problems. Subsequently, it is shown that models may be formulated using alternative discrete-time operators which have low sensitivity properties. Consideration is given to the problem of finding parameters for stable alternative discrete-time operators. A learning algorithm which adapts the alternative discrete-time operators parameters on-line is presented for MLP neural network models based on alternative discrete-time operators. It is shown that neural network models which use these alternative discrete-time perform better than those using the shift operator alone. (Also cross-referenced as UMIACS-TR-97-03) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Laboratory for Artificial Brain Systems, Institute of Physical and, Faculty of Informatics, University of Wollongong, Australia, AADM Consulting, Califon, NJ, NEC Research Institute, Princeton, NJ,
April 1997.
Iteration Space Slicing and Its Application to Communication. William Pugh. Evan Rosser. Program slicing is an analysis that answers questions such as ``Which statements might affect the computation of variable $v$ at statement $s$?'' or ``Which statements depend on the value of $v$ computed in statement $s$?''. The answers computed by program slicing are generally a set of statements. We introduce the idea of {\em iteration spacing slicing}: we refine program slicing to ask questions such as ``Which iterations of which statements might effect the computation in iterations $I$ of statement $s$?'' or ``Which iterations of which statements depend on the value computed by iterations $I$ of statement $s$?''. One application of this general-purpose technique is optimization of interprocessor communication in data-parallel compilers. For example, we can separate a code fragment into 1) those iterations that must be done before a send, 2) those iterations that don't need to be done before a send and don't depend on non-local data and 3), those iterations that depend on non-local data. We examine applications of iteration space slicing to communication optimizations in parallel executions of programs such as stencil computations and block-cyclic Gaussian elimination with partial pivoting. (Also cross-referenced as UMIACS-TR-97-02) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Study of Internet Round-Trip Delay. Anurag Acharya. Joel Saltz. December 1996.
We present the results of a study of Internet round-trip delay. The links chosen include links to frequently accessed commercial hosts as well as well-known academic and foreign hosts. Each link was studied for a 48-hour period. We attempt to answer the following questions: (1) how rapidly and in what manner does the delay change -- in this study, we focus on medium-grain (seconds/minutes) and coarse-grain time-scales (tens of minutes/hours); (2) what does the frequency distribution of delay look like and how rapidly does it change; (3) what is a good metric to characterize the delay for the purpose of adaptation. Our conclusions are: (a) there is large temporal and spatial variation in round-trip time (RTT); (b) RTT distribution is usually unimodal and asymmetric and has a long tail on the right hand side; (c) RTT observations in most time periods are tightly clustered around the mode; (d) the mode is a good characteristic value for RTT distributions; (e) RTT distributions change slowly; (f) persistent changes in RTT occur slowly, sharp changes are undone very shortly; (g) jitter in RTT observations is small and (h) inherent RTT occurs frequently. (Also cross-referenced as UMIACS-TR-96-97) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Reducing Router-Crossings in a Mobile Intranet. Rohit Dube. Ibrahim Korpeoglu. Satish K. Tripathi. January 1997.
Current general purpose mobility solutions like Mobile-IP involve multiple router-crossings even when the mobile host moves within an intranet from one subnet of a router to another. An environment consisting of a large number of mobile hosts would congest the router causing hosts to experience high latency and jitter. This paper presents a mechanism to eliminate multiple router-crossings in a mobile intranet, which reduces the load on the routers and the hand-off and data latency at the mobile hosts. (Also cross-referenced as UMIACS-TR-97-01) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Incremental Data Structures and Algorithms for Dynamic Query. Egemen Tanin. Richard Beigel. Ben Shneiderman. January 1997.
Dynamic query interfaces (DQIs) are a recently developed form of database access that provides continuous realtime feedback to the user during the query formulation process. Previous work shows that DQIs are an elegant and powerful interface to small databases. Unfortunately, when applied to large databases, previous DQI algorithms slow to a crawl. We present a new approach to DQI algorithms that works well with large databases. HCIL, Dept. of Computer Science, Univ. of Maryland,
A Prototype for a Distributed Space Physics Data System. Charles Falkenberg. Chuck Goodrich. James Gallagher. Peter Cornillon. Glenn Flierl. December 1996.
The collaborative analysis of data within the Space Physics community is hindered, in part, by the wide number of data formats and the wide distribution of data archives. In an attempt to address these two problems we have implemented a prototype which retrieves datasets, stored in different data formats at several remote locations. Our prototype uses the Key Parameter Visualization Tools (KPVT) and the Distributed Oceanographic Data System (DODS) to view data from the ISEE1, ISEE2, and ISTP programs. Our goal is to demonstrate the ability to access and use several types of remote data and existing analysis tools. The work described demonstrates the power of an expressive data model, like the one in DODS, for converting and transmitting space physics data. Furthermore, since the DODS system architecture (and associated data model) was developed to meet oceanographic needs, the fact that it works well for use within the space physics community suggests that the DODS approach will also work well as a data distribution mechanism for the other earth science sub-disciplines. Given the growing interest in interdisciplinary work in the earth sciences the existence of a data model/system capable of spanning the various sub-disciplines is significant. Dept. of Computer Science, Univ. of Maryland, Advanced Visualization Laboratory, Univ. of Maryland, Graduate School of Oceanography, Univ. of Rhode Island, Massachusetts Institute of Technology,
Toward Optimizing Distributed Programs Directed by Configurations. Tae-Hyung Kim. December 1996.
Networks of workstations are now viable environments for running distributed and parallel applications. Recent advances in software interconnection technology enables programmers to prepare applications to run in dynamically changing environments because module interconnection activity is regarded as an essentially distinct and different intellectual activity so as isolated from that of implementing individual modules. But there remains the question of how to optimize the performance of those applications for a given execution environment: how can developers realize performance gains without paying a high programming cost to specialize their application for the target environment? Interconnection technology has allowed programmers to tailor and tune their applications on distributed environments, but the traditional approach to this process has ignored the performance issue over gracefully seemless integration of various software components. Networks of workstations can be virtual parallel machines. For a distributed and parallel application on such environments, an ability to write performance-literate programs is as important as that to seemlessly integrate distributed modules. Our dissertation research is an effort to extend the plain interconnection technology to that with a variety of performance attributes. The RPC (remote procedure call) paradigm is used at the module programming level because it adopts a widely used and understood procedure call abstraction as the sole mechanism of remote operations and thus helps to shape reusable components. Most of performance related decisions are pertinent to the interconnections among software components. Our effort toward performance tuning consists of two main thrusts. One is an automatic adaptation from a performance configuration, which is analogous to the process of software interconnection for traditional structure-oriented configurations. We present how a performance configuration can be represented as an extension to traditional module interconnections. The other is an optimal transformation for RPC statements in an individual module using various program analysis techniques. Conventional stub generation based approach to implement RPC paradigm cannot serve for performance improvement because of its synchronous property. In concert with the two systematic approaches toward optimizing distributed programs, programmers can have high performance and conceptual simplicity in writing distributed programs. Dept. of Computer Science, Univ. of Maryland,
Iterative Solution of the Helmholtz Equation By a Second-Order Method. Kurt Otto. Elisabeth Larsson. December 1996.
The numerical solution of the Helmholtz equation subject to nonlocal radiation boundary conditions is studied. The specific problem is discretized with a second-order accurate finite-difference method, resulting in a linear system of equations. To solve the system of equations, a preconditioned Krylov subspace method is employed. The preconditioner is based on fast transforms, and yields a direct fast Helmholtz solver for rectangulay domains. Numerical experiments for curved ducts demonstrate that the rate of convergence is high. Compared with band Gaussian elimination the preconditioned iterative method shows a significant gain in both storage requirement and arithmetic complexity. (Also cross-referenced as UMIACS-TR-96-95) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Scientific Computing, Uppsala Univ., Uppsala Sweden,
Organizational Issues in Software Development:An Empirical Study of. Carolyn B. Seaman. December 1996.
The subject of this dissertation is an empirical study whose goal is to characterize certain aspects of communication among members of a software development organization. The independent variables in this study are various attributes of organizational structure. The dependent variable is the effort spent on sharing information which is required by the code inspection process in use. The research questions upon which the study is based ask whether or not these attributes of organizational structure have an effect on the amount of communication effort expended. In addition, several other variables have been included, such as code size and complexity, which represent factors other than organizational structure which may have an effect on communication effort. The study uses both quantitative and qualitative methods for data collection and analysis. These methods include participant observation, structured interviews, graphical data presentation, and interpretation of statistical results with qualitative anecdotes. In addition, a pilot study was conducted to test this combination of methods. The findings, which are presented as a set of hypotheses, show that all of the organizational structure characteristics studied do have an effect on communication effort, at least in some circumstances. The work described in this dissertation helps to enable a whole new area of research, by illustrating one effective way of conducting such investigations, and by providing some hypotheses with which to begin. (Also cross-referenced as UMIACS-TR-96-94) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Optimization within a Unified Transformation Framework. Wayne Kelly. August 1996.
Programmers typically want to write scientific programs in a high level language with semantics based on a sequential execution model. To execute efficiently on a parallel machine, however, a program typically needs to contain explicit parallelism and possibly explicit communication and synchronization. So, we need compilers to convert programs from the first of these forms to the second. There are two basic choices to be made when parallelizing a program. First, the computations of the program need to be distributed amongst the set of available processors. Second, the computations on each processor need to be ordered. My contribution has been the development of simple mathematical abstractions for representing these choices and the development of new algorithms for making these choices. I have developed a new framework that achieves good performance by minimizing communication between processors, minimizing the time processors spend waiting for messages from other processors, and ordering data accesses so as to exploit the memory hierarchy. This framework can be used by optimizing compilers, as well as by interactive transformation tools. The state of the art for vectorizing compilers is already quite good, but much work remains to bring parallelizing compilers up to the same standard. The main contribution of my work can be summarized as improving this situation by replacing existing ad hoc parallelization techniques with a sound underlying foundation on which future work can be built. (Also cross-referenced as UMIACS-TR-96-93) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in. Tamara G. Kolda. Dianne P. O'Leary. December 1996.
The vast amount of textual information available today is useless unless it can be effectively and efficiently searched. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is usually accomplished using a low-rank singular value decomposition (SVD) approximation. In this paper, we use an alternate decomposition, the semi-discrete decomposition (SDD). In our tests, for equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage. Additionally, we show how to update the SDD for a dynamically changing document collection. (Also cross-referenced as UMIACS-TR-96-92) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Presentation Planning for Distributed Video Systems. Eenjun Hwang. B. Prabhakaran. V.S. Subrahmanian. December 1996.
A distributed video-on-demand system is one where a collection of video data is located at dispersed sites across a computer network. In a single-site environment, a local video server retrieves video data from its local storage device (or devices). However, in the setting of a distributed VoD system, when a customer requests a movie from his/her local server, the server may need to interact with other servers located across the network. In this paper, we present three types of presentation plans, that a local server must construct in order to satisfy the customer's request. Informally speaking, a presentation plan is a detailed (temporally synchronized) sequence of steps that the host server must perform at given points in time. This involves obtaining committments from other video servers, obtaining committments from the network service provider, as well as making committments of local resources, within the limitations of available bandwidth, available buffer, and customer/client data consumption rates. The three types of plans described in this paper all work at different "levels of abstraction" in this planning process. Furthermore, we introduce two measures of how good a plan is: minimizing wait time for the customer, and minimizing a quantity called access bandwidth (which informally speaking, specifies how much network/disk bandwidth is used). We develop algorithms to compute optimal (w.r.t. the above measures) plans for all three types, and show experimentally that in all three cases, one of the three types of plans (called a hybrid presentation plan) systematically outperforms the other two. In addition to these new concepts, our framework has the advantage that many results that had previously been verified experimentally in the literature can now be conclusively proved mathematically. (Also cross-referenced as UMIACS-TR-96-91) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Software Engineering of Virtual Environments:. Donald J. Welch. James M. Purtilo. July 1996.
Virtual Environments(VEs) are proving to be valuable resources in many fields, and they are even more useful when they involve multiple users in distributed environments. Many useful VEs were designed to be stand-alone applications, without consideration for integrating them into a distributed VE. Our approach to connecting VEs is to define an abstract model for the interconnection, use integration tools to do as much of the work automatically as possible, and use a run-time environment to support the interconnection. With our experiences to date, we are learning that certain classes of techniques are common to all solutions using this approach. We have summarized these in a set of requirements and are building a system that features these techniques as first class objects. In the future you will be able to solve these interconnection problems cheaply, plus engineers of future VEs will have some guidance on how they should organize their implementations so that interconnection with other VEs will be easier. In this paper we coin the phrase "software engineering of virtual environments" (SEVE) to describe the above activities. (Also cross-referenced as UMIACS-TR-96-89) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Interconnecting Distributed Legacy Systems: Virtual Environment. Donald J. Welch. James M. Purtilo. October 1996.
As the power and utility of virtual reality environments increases, so do the potential benefits found from combinding several such environments. But doing so presents the developer with a host of difficult distributed systems issues. This paper explores what some of these issues are within the VE domain, relates our successes to date in overcoming these problems by means of various automated tools, and suggests ways to apply our results other target domains. (Also cross-referenced as UMIACS-TR-96-88) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Zubin: A Software Engineering Environment for Interconnecting Legacy. Donald J. Welch. James M. Purtilo. November 1996.
As the power and utility of virtual reality environments increases, so do the potential benefits found from combining several such environments. But doing so presents the developer with a host of difficult software engineering issues. This paper explores what some of these issues are within the VE domain, relates our successes to date in overcoming these problems by means of various automated tools, and suggests ways to apply our results other target domains. (Also cross-referenced as UMIACS-TR-96-87) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Previews and Overviews in Digital Libraries: Designing Surrogates to. Hanan Samet. October 1997.
To aid designers of digital library interfaces and web sites in creating comprehensible, predictable and controllable environments for their users, we define and discuss the benefits of previews and overviews as visual information representations. Previews and overviews are graphic or textual representations of information abstracted from primary information objects. They serve as surrogates for those objects. When utilized properly, previews and overviews allow users to rapidly discriminate objects of interest from those not of interest, and to more fully understand the scope and nature of large collections of information resources. We provide a more complete definition of previews and overviews, and discuss system parameters and aspects of primary information objects relevant to designing effective preview and overviews. Finally, we present examples that illustrate the use of previews and overviews and offer suggestions for designers. Department of Computer Science, University of Maryland,
Self-Replicating Structures in a Cellular Automata Space. Hui-Hsien Chou. July 1996.
Biological experience and intuition suggest that self-replication is an inherently complex phenomenon, and early cellular automata self-replication models developed by computer scientists and mathematicians supported that view. However, since von~Neumann's original work in the 1950's, the study of cellular automata models of self-replicating systems has progressively led to smaller and simpler systems. This thesis demonstrates for the first time that it is possible to create automatically self-replicating structures in cellular automata models rather than, as has been done in the past, to design them manually. These emergent self-replicating structures employ a General Purpose Self-Replicating cellular automata rule set which can support the replication of structures of different sizes and their growth from smaller to larger ones. This thesis also demonstrates that, by letting self-replicating structures carry additional information besides replication instructions, they can be used to solve computationally hard problems such as the Satisfiability (SAT) problem. It is shown that self-replicating structures can be made to carry characteristic codes and selection forces can be implemented in cellular automata space. This study opens the door to further studies that could lead to general, solution-evolvable structures and truly self-programming systems. (Also cross-referenced as UMIACS-TR-96-85) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Query Scrambling for Bursty Data Arrival.. Laurent Amsaleg. Michael J. Franklin. A. Tomasic. November 1996.
Distributed databases operating over wide-area networks, such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources may vary widely due to network congestion, link failure, and other problems. In this paper we examine a new class of methods, called query scrambling, for dealing with unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. We explore various choices in the implementation of these methods and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and it focuses on bursty arrivals of data. We identify and study a number of the basic trade-offs that arise when designing scrambling policies for the bursty environment. Our performance results show that query scrambling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS-TR-96-84) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Large Latent Semantic Indexing via a Semi-Discrete Matrix Decomposition. Tamara G. Kolda. Dianne P. O'Leary. November 1996.
With the electronic storage of documents comes the possibility of building search engines that can automatically choose documents relevant to a given set of topics. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. There are a number of information retrieval systems based on inexact matches. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is usually accomplished using a low-rank singular value decomposition (SVD) approximation. In this paper, we use an alternate decomposition, the semi-discrete decomposition (SDD). For equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage for the MEDLINE test set. (Also cross-referenced as UMIACS-TR-96-83) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Preconditioning for the Steady-State Navier-Stokes Equations with Low. Howard C. Elman. November 1996.
We introduce a preconditioner for the linearized Navier-Stokes equations that is effective when either the discretization mesh size or the viscosity approaches zero. For constant coefficient problems with periodic boundary conditions, we show that the preconditioning yields a system with a single eigenvalue equal to one, so that performance is independent of both viscosity and mesh size. For other boundary conditions, we demonstrate empirically that convergence depends only mildly on these parameters and we give a partial analysis of this phenomenon. We also show that some expensive subsidiary computations required by the new method can be replaced by inexpensive approximate versions of these tasks based on iteration, with virtually no degradation of performance. (Also cross-referenced as UMIACS-TR-96-82) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiler-directed Dynamic Linking for Mobile Programs. Anurag Acharya. Joel Saltz. November 1996.
In this paper, we present a compiler-directed technique for safe dynamic linking for mobile programs. Our technique guarantees that linking failures can occur only when a program arrives at a new execution site and that this failure can be delivered to the program as an error code or an exception. We use interprocedural analysis to identify the set of names that must be linked at the different sites the program executes on. We use a combination of runtime and compile-time techniques to identify the calling context and to link only the names needed in that context. Our technique is able to handle recursive programs as well as separately compiled code that may itself be able to move. We discuss language constructs for controlling the behavior of dynamic linking and the implication of some of these constructs for application structure. (Also cross-referenced as UMIACS-TR-96-81) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Utility of Exploiting Idle Workstations for Parallel Computation. Anurag Acharya. Guy Edjlali. Joel Saltz. November 1996.
In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of $k$ workstations available? This provides an estimate of the opportunity for parallel computation. Second, how stable is a cluster of free machines and how does the stability vary with the size of the cluster? This indicates how frequently a parallel computation might have to stop for adapting to changes in processor availability. Third, what is the distribution of workstation idle-times? This information is useful for selecting workstations to place computation on. Fourth, how much benefit can a user expect? To state this in concrete terms, if I have a pool of size S, how big a parallel machine should I expect to get for free by harvesting idle machines. Finally, how much benefit can be achieved on a real machine and how hard does a parallel programmer have to work to make this happen? To answer the workstation-availability questions, we have analyzed 14-day traces from three workstation pools. To determine the equivalent parallel machine, we have simulated the execution of a group of well-known parallel programs on these workstation pools. To gain an understanding of the practical problems, we have developed the system support required for adaptive parallel programs as well as an adaptive parallel CFD application. (Also cross-referenced as UMIACS-TR-96-80) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On the Weighting Method for Least Squares Problems with Linear. G. W. Stewart. November 1996.
The weighting method for solving a least squares problem with linear equality constraints multiplies the constraints by a large number and appends them to the top of the least squares problem, which is then solved by standard techniques. In this paper we give a new analysis of the method, based on the QR~decomposition, that exhibits many features of the algorithm. In particular it suggests a natural criterion for chosing the weighting factor. (Also cross-referenced as UMIACS-TR-96-79) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
LARGE-SCALE OPTIMIZATION OF NEURON ARBORS. Christopher Cherniak. Mark Changizi. Du Won Kang. November 1996.
At the global as well as local scale, some of the geometry of types of neuron arbors--both dendrites and axons--appears to be self- organizing: Their morphogenesis behaves like flowing water, that is, fluid-dynamically; waterflow in branching networks in turn acts like a tree composed of cords under tension, that is, vector-mechanically. The result is that such neuron trees globally minimize their total volume--rather than, for example, surface area or branch-length--to about 5% of optimum for interconnecting their terminals. These kinds of arbors similarly perform well at generating the cheapest topology connecting their terminals: their large-scale layouts are among the top few of all such possible connecting-patterns. Also cross-referenced as UMIACS-TR-96-78 University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, Committee on History and Philosophy of Science, Department of,
A Delay Damage Model Selection Algorithm for NARX Neural Networks. Tsungnan Lin. C. Lee Giles. Bill G. Horne. Sun-Yang Kung. December 1996.
Recurrent neural networks have become popular models for system identification and time series prediction. NARX (Nonlinear AutoRegressive models with eXogenous inputs) neural network models are a popular subclass of recurrent networks and have beenused in many applications. Though embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show that using intelligent memory order selection through pruning and good initial heuristics significantly improves the generalization and predictive performance of these nonlinear systems on problems as diverse as grammatical inference and time series prediction. (Also cross-referenced as UMIACS-TR-96-77) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, NEC Research Institute, Dept. of Electrical Engineering, Princeton University,
Grindstone: A Test Suite for Parallel Performance Tools. Jeffrey K. Hollingsworth. Michael Steele. October 1996.
We describe Grindstone, a suite of programs for testing and calibrating parallel performance measurement tools. The suite consists of nine simple SPMD style PVM programs that demonstrate common communication and computational bottlenecks that occur in parallel programs. In addition, we provide a short case study that demonstrates the use of the test suite on three performance tools for PVM. The results of the case study showed that we were able to uncover bugs or other anomalies in all three tools. The paper also describes how to acquire, compile, and use the test suite. (Also cross-referenced as UMIACS-TR-96-73) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Intensional Query Optimization. Parke Godfrey. Jarek Gryz. September 1996.
We have introduced a new query optimization framework called intensional query optimization (IQO), which enables existing optimization techniques to be applied to queries that use views. In particular, we consider that view definitions may employ unions. Advanced database technologies and applications--such as federation and mediation over heterogeneous database sources--lead to such complex view definitions, and to the need to handle complex, expensive queries. Query rewriting techniques have been proposed which exploit semantic query caches, materialized views, and semantic knowledge about the database domain to optimize query evaluation. These can augment syntactic optimization to reduce evaluation costs further. Such techniques include semantic query caching, query folding, and semantic query optimization. However, most proposed rewrite techniques ignore views in queries; that is, the views are considered as other tables. The IQO framework enables rewrites to be applied to various expansions of the query, even when no such rewrite is applicable directly to the query itself. With IQO, we optimize the query tree, not just the query. The IQO framework introduces the notion of a discounted query, which is a query with some of its expansions "separated out", so the query can be recast into pieces that can be optimized. For this approach to be effective, the sum of the costs of evaluating each piece must be less than the cost of evaluating the query itself. This includes the discounted query. We develop an evaluation plan for discounted queries that is generally more efficient than the evaluation of the queries themselves. (Also cross-referenced as UMIACS-TR-96-72) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Stabbing Orthogonal Objects in 3-Space. David M. Mount. Fan-Tao Pu. October 1996.
We consider a problem that arises in the design of data structures for answering {\em visibility range queries}, that is, given a $3$-dimensional scene defined by a set of polygonal patches, we wish to preprocess the scene to answer queries involving the set of patches of the scene that are visible from a given range of points over a given range of viewing directions. These data structures recursively subdivide space into cells until some criterion is satisfied. One of the important problems that arise in the construction of such data structures is that of determining whether a cell represents a nonempty region of space, and more generally computing the size of a cell. In this paper we introduce a measure of the {\em size} of the subset of lines in 3-space that stab a given set of $n$ polygonal patches, based on the maximum angle and distance between any two lines in the set. Although the best known algorithm for computing this size measure runs in $O(n^2)$ time, we show that if the polygonal patches are orthogonal rectangles, then this measure can be approximated to within a constant factor in $O(n)$ time. (Also cross-referenced as UMIACS-TR-96-71) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Enhancing Software DSM for Compiler-Parallelized Applications. Pete Keleher. Chau-Wen Tseng. September 1996.
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed-shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: 1) combine synchronization and parallelism information communication on parallel task invocation, 2) employ customized routines for evaluating reduction operations, and 3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good speedups eight processors on an IBM SP-2 and DEC Alpha cluster, usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Based on our experimental results, we point out areas where additional compiler analysis and software DSM improvements can be used to achieve good performance on a broader range of applications. (Also cross-referenced as UMIACS-TR-96-70) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
NetDyn Revisited: A Replicated Study of Network Dynamics. Julie Pointek. Forrest Shull. Roseanne Tesoriero. Ashok K. Agrawala. October 1996.
In 1992 and 1993, a series of experiments using the NetDyn tool was run at the University of Maryland to characterize network behavior. These studies identified multiple design and implementation faults in the Internet. Since that time, there has been a wide array of changes to the Internet. During the Spring of 1996, we conducted a replication of the NetDyn experiments in order to characterize end-to-end behavior in the current environment. In this paper, we present and discuss the latest results obtained during this study. Although the network seems to be stabilizing with respect to transit times, our current results are similar to the results from past experiments. That is, networks often exhibit unexpected behavior. The data suggest that while there has been improvement, there are still problem areas that need to be addressed. (Also cross-referenced as UMIACS-TR-96-69) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Elastic Windows: Evaluation of Multi-Window Operations. Eser Kandogan. Ben Shneiderman. October 1996.
Most windowing systems follow the independent overlapping windows approach, which emerged as an answer to the needs of the 1980s' technology. Due to advances in computers and display technology, and increased information needs, modern users demand more functionality from window management systems. We proposed Elastic Windows with improved spatial layout and rapid multi-window operations as an alternative to current window management strategies for efficient personal role management [kandogan]. In this approach, multi-window operations are achieved by issuing operations on window groups hierarchically organized in a space-filling tiled layout. This paper describes the Elastic Windows interface briefly and then presents a study comparing user performance with Elastic Windows and traditional window management techniques for 2, 6, and 12 window situations. Elastic Windows users had statistically significantly faster performance for all 6 and 12 window situations, for task environment setup, task environment switching, and task execution. These results suggest promising possibilities for multiple window operations and hierarchical nesting, which can be applied to the next generation of tiled as well as overlapped window managers. Human-Computer Interaction Laboratory, Univ. of Maryland, Institute for Systems Research, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland,
Bringing Treasures to the Surface - Iterative Design for the Library of. Catherine Plaisant. Gary Marchionini. Tom Bruns. Anita Komlodi. Laura Campbell. October 1996.
The Human-Computer Interaction Lab worked with a team of the Library of Congress (LC) to develop and test interface designs for LCUs National Digital Library Program. Three iterations are described and illustrate the progression of the design toward a compact design that minimizes scrolling and jumping and anchors users in a screen space that tightly couples search and results. Issues and resolutions are discussed for each iteration and reflect the challenges of incomplete metadata, data visualization, and the rapidly changing web environment. Human-Computer Interaction Laboratory, Univ. of Maryland, Digital Library Research Group, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland, National Digital Library Program, Library of Congress, Washington DC,
Synthesizing Protocol Specifications from Service Specifications. Jun-Cheol Park. Raymond E. Miller. September 1996.
We propose a specification model and present a method to algorithmically derive a protocol specification from a service specification based on the model. Unlike the previous models based on finite state machines, the proposed model can explicitly express concurrency, synchronization, and timing requirements such as delays and timeouts. We assume that there exists a reliable communication channel between any two protocol entities and the maximum delay for each channel is bounded by a positive constant. Because of the variable nature of the communication delays along with the time constraints associated with events, no protocol specification can fully simulate the service specification. The proposed method derives a protocol specification that is optimal in the sense that it provides the largest possible subset of the service specification under the communication delay constraints. We also give a method to derive a sub specification from a service specification and a maximum communication delay of each channel such that the sub specification, but no superset of it, can be simulated by the derived protocol specification. Dept. of Computer Science, Univ. of Maryland,
September 1996.
Putting Visualization to Work -- ProgramFinder for Youth Placement. Jason Ellis. Anne Rose. Catherine Plaisant. The Human-Computer Interaction Laboratory (HCIL) and the Maryland Department of Juvenile Justice (DJJ) have been working together to develop the ProgramFinder, a tool for choosing programs for a troubled youth from drug rehabilitation cente rs to secure residential facilities. The seemingly straightforward journey of t he ProgramFinder from an existing user interface technique to a product design r equired the development of five different prototypes which involved user interfa ce design, prototype implementation, and selecting search criterion. While HCIL 's effort focused primarily on design and implementation, DJJ's attribute select ion process was the most time consuming and difficult task. We also found that a direct link to DJJ's workflow was needed in the prototypes to generate the nec essary "buy-in". This paper analyzes the interaction between the efforts of HCI L and DJJ and the amount of "buy-in" by DJJ staff and management. Lesson learne d are presented for developers. Human-Computer Interaction Laboratory, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Customizable Simulator for Workstation Networks. Mustafa Uysal. Anurag Acharya. Robert Bennett. Joel Saltz. September 1996.
We present a customizable simulator called netsim for high-performance point-to-point workstation networks that is accurate enough to be used for application-level performance analysis yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16-node IBM SP-2 with a multistage network and a 10-node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and a 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm. In addition, we show that the cross-traffic congestion for today's high-speed point-to-point networks has little, if any, effect on application-level performance and that modeling end-point congestion is sufficient for a reasonably accurate simulation. (Also cross-referenced as UMIACS-TR-96-68) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Titan A High-Performance Remote-Sensing Database. Chialin Chang. Bongki Moon. Anurag Acharya. Carter Shock. Alan Sussman. Joel Saltz. August 1996.
There are two major challenges for a high-performance remote-sensing database. First, it must provide low-latency retrieval of very large volumes of spatio-temporal data. This requires effective declustering and placement of a multi-dimensional dataset onto a large disk farm. Second, the order of magnitude reduction in data-size due to post-processing makes it imperative, from a performance perspective, that the postprocessing be done on the machine that holds the data. This requires careful coordination of computation and data retrieval. This paper describes the design, implementation and evaluation of {\em Titan}, a parallel shared-nothing database designed for handling remote-sensing data. The computational platform for Titan is a 16-processor IBM SP-2 with four fast disks attached to each processor. Titan is currently operational and contains about 24~GB of data from the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7 satellite. The experimental results show that Titan provides good performance for global queries, and interactive response times for local queries. (Also cross-referenced as UMIACS-TR-96-67) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Symbolic Model Checking of Infinite State Programs Using Presburger. Tevfik Bultan. Richard Gerber. William Pugh. September 1996.
Model checking is a powerful technique for analyzing large, finite-state systems. In an infinite transition system, however, many basic properties are undecidable. In this paper we present a new symbolic model checker which conservatively evaluates safety and liveness properties on infinite-state programs. We use Presburger formulas to symbolically encode a program's transition system, as well as its model-checking computations. All fixpoint calculations are executed symbolically, and their convergence is guaranteed by using approximation techniques. We demonstrate the promise of this technology on some well-known infinite-state concurrency problems. (Also cross-referenced as UMIACS-TR-96-66) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Regularization Algorithms Based on Total Least Squares. Per Christian Hansen. Dianne P. O'Leary. September 1996.
Discretizations of inverse problems lead to systems of linear equations with a highly ill-conditioned coefficient matrix, and in order to compute stable solutions to these systems it is necessary to apply regularization methods. Classical regularization methods, such as Tikhonov's method or truncated {\em SVD}, are not designed for problems in which both the coefficient matrix and the right-hand side are known only approximately. For this reason, we develop {\em TLS}\/-based regularization methods that take this situation into account. Here, we survey two different approaches to incorporation of regularization, or stabilization, into the {\em TLS} setting. The two methods are similar in spirit to Tikhonov regularization and truncated {\em SVD}, respectively. We analyze the regularizing properties of the methods and demonstrate by numerical examples that in certain cases with large perturbations, these new methods are able to yield more accurate regularized solutions than those produced by the standard methods. (Also cross-referenced as UMIACS-TR-96-65) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Deparment of Mathematical Modelling, Technical Univ. of Denmark,
Pivoted Cauchy-like Preconditioners for Regularized Solution of. Misha E. Kilmer. Dianne P. O'Leary. September 1996.
Many ill-posed problems are solved using a discretization that results in a least squares problem or a linear system involving a Toeplitz matrix. The exact solution to such problems is often hopelessly contaminated by noise, since the discretized problemis quite ill-conditioned, and noise components in the approximate null-space dominate the solution vector. Therefore we seek an approximate solution that does not have large components in these directions. We use a preconditioned conjugate gradient algorithm to compute such a regularized solution. An orthogonal change of coordinates transforms the Toeplitz matrix to a Cauchy-like matrix, and we choose our preconditioner to be a low rank Cauchy-like matrix determined in the course of Gu's fast modified complete pivoting algorithm. We show that if the kernel of the ill-posed problem is smooth, then this preconditioner has desirable properties: the largest singular values of the preconditioned matrix are clustered around one, the smallest singular values, corresponding to the noise subspace, remain small, and the signal and noise spaces are relatively unmixed. The preconditioned algorithm costs only $O(n \lg n)$ operations per iteration for a problem with $n$ variables. The effectiveness of the preconditioner for filtering noise is demonstrated on three examples. (Also cross-referenced as UMIACS-TR-96-63) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Applied Mathematics Program, Univ. of Maryland,
Updating Discourse Context with Active Logic. John Gurney. Khemdut Purang. Don Perlis. June 1996.
In this paper we present our implementation of a system of active logic that processes natural language discourses. We focus on problems that involve presupposition and the associated well-known problems of the projection of presupposition. We discuss Heim's largely successful theory of presupposition and point out certain limitations. We then use these observations to build our discourse processor based on active logic. Our main contributions are the handling of problems that go beyond the scope of Heim's theory , especially discourses the involve cancellation of presupposition. Ongoing work suggests that conversational implicature and the cancellation of implicature can also be treated by our methods. Key words: presupposition, discourse, con text, accommodation, active logic, implicature. (Also cross-referenced as UMIACS-TR-96-62) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Army Research Laboratory, Adelphi MD,
Defaults Denied. Michael Miller. Don Perlis. Khemdut Purang. June 1996.
We take a tour of various themes in default reasoning, examining new ideas as well as those of Brachman, Delgrande, Poole, and Schlechta. An underlying issue is that of stating that a potential default principle is not appropriate. We see this arise most dramatically as a problem in an attempt to formalize what are often loosely called "prototypes", although it also arises in other formal approaches to default reasoning. Some formalisms in the literature provide solutions but not without costs. We propose a formalism that appears to avoid these costs; it can be seen as a step toward a population-based set-theoretic modification of these approaches, that may ultimately provide a closer tie to recent work on statistical (quantitative) foundations of (qualitative) defaults([1]). Our analysis in particular indicates the need to resolve a conflation between use and mention in many default formalisms. Our treatment proposes such a resolution, and also explores the use of sets toward a more population-based notion of default. (Also cross-referenced as UMIACS-TR-96-61) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Intelligent Automation Inc., Rockville MD,
Automated Discovery of Self-Replicating Structures in Cellular Space. Jason D. Lohn. August 1996.
This thesis demonstrates for the first time that it is possible to automatically discover self-replicating structures in cellular space automata models rather than, as has been done in the past, to design them manually. Self-replication is defined as the process an entity undergoes in constructing a copy of itself. Von~Neumann was the first to investigate artificial self-replicating structures and did so in the context of cellular automata, a cellular space model consisting of numerous finite-state machines embedded in a regular tessellation. Interest in artificial self-replicating systems has increased in recent years due to potential applications in molecular-scale manufacturing, programming parallel computing systems, and digital hardware design, and also as part of the field of artificial life. In this dissertation, genetic algorithms are used with a cellular automata framework for the first time to automatically discover self-replicating structures. The discovered self-replicating structures compare favorably in terms of simplicity with those generated manually in the past but differ in unexpected ways. This dissertation presents representative samples of the self-replicating structures and analyzes them both quantitatively and qualitatively. In order to effectively search the underlying rule space of such automata models, a fitness function consisting of three independent criteria is designed and successfully applied. Also, a new cellular space automata model called effector automata is introduced. It is shown to be more computationally feasible and to promote the discovery of more self-replicating structures as compared to the cellular automata models used in previous studies. In addition, a new paradigm for cellular space models with weak rotational symmetry called component-sensitive input is introduced and shown to facilitate discovery of self-replicating structures. The results presented suggest that genetic algorithms can be powerful tools for exploring the space of possible self-replicating structures. Furthermore, this research sheds light on the nature of creating self-replicating structures and opens the door to further studies that could eventually lead to the discovery of new self-replicating molecular structures. (Also cross-referenced as UMIACS-TR-96-60) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Electrical Engineering, Univ. of Maryland,
Performance of On-Line Learning Methods in Predicting Multiprocessor. Majd F. Sakr. Steven P. Levitan. Donald M. Chiarulli. Bill G. Horne. C. Lee Giles. October 1996.
Shared memory multiprocessors require reconfigurable interconnection networks (INs) for scalability. These INs are reconfigured by an IN control unit. However, these INs are often plagued by undesirable reconfiguration time that is primarily due to control latency, the amount of time delay that the control unit takes to decide on a desired new IN configuration. To reduce control latency, a trainable prediction unit (PU) was devised and added to the IN controller. The PU's job is to anticipate and reduce control configuration time, the major component of the control latency. Three different on-line prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2-D relaxation algorithm, matrix multiply and Fast Fourier Transform. The predictions were then used by a routing control algorithm to reduce control latency by configuring the IN to provide needed memory access paths before they were requested. Three prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. As expected, different predictors performed best on different applications, however, the TDNN produced the best overall results. (Also cross-referenced as UMIACS-TR-96-59) University of Maryland Institute for Advanced Computer Studies, NEC Research Institute, Princeton NJ, Electrical Engineering Department, University of Pittsburgh, Computer Science Department, University of Pittsburgh, Department of Computer Science, University of Maryland,
Iterative Methods for Problems in Computational Fluid Dynamics. Howard C. Elman. David J. Silvester. Andrew J. Wathen. August 1996.
We discuss iterative methods for solving the algebraic systems of equations arising from linearization and discretization of primitive variable formulations of the incompressible Navier-Stokes equations. Implicit discretization in time leads to a coupled but linear system of partial differential equations at each time step, and discretization in space then produces a series of linear algebraic systems. We give an overview of commonly used time and space discretization techniques,and we discuss a variety of algorithmic strategies for solving the resulting systems of equations.The emphasis is on preconditioning techniques, which can be combined with Krylov subspace iterative methods.In many cases the solution of subsidiary problems such as the discrete convection-diffusion equation and the discrete Stokes equations plays a crucial role. We examine iterative techniques for these problems and show how they can be integrated into effective solution algorithms for the Navier-Stokes equations. (Also cross-referenced as UMIACS-TR-96-58) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Mathematics, University of Manchester Institute of Science, Oxford University Computing Laboratory,
Final Iterations in Interior Point Models -- Preconditioned Conjugate. Weichung Wang. August 1996.
In this article we consider modified search directions in the endgame of interior point methods for linear programming. In this stage, the normal equations determining the search directions become ill-conditioned. The modified search directions are computered by solving perturbed systems in which the systems may be solved efficiently by the preconditioned conjugate gradient solver. We prove the convergence of the interior point methods using the modified search directions and show that each barrier problem is solved with a superlinear convergence rate. A variation of Cholesky factorization is presented for computing a better preconditioner when the normal equations are ill-conditioned. These ideas have been implemented successfully and the numerical results show that the algorithms enhance the performance of the preconditioned conjugate gradients-based interior point methods. Dept. of Computer Science, Univ. of Maryland,
Simulation for Computer Science Majors: A Preliminary Report. Ruth Silverman. August 1996.
The author is revising and restructuring an existing simulation course designed primarily for senior computer science majors by: 1) developing an integrated set of laboratory exercises based on computer science topics using commercially available software (GPSS/H); 2) incorporating these materials into a formal laboratory manual along with related computer science reference materials and instructions in the use of the software; 3) implementing a pilot course using this manual together with a single text in the theory of simulation; 4) preparing a syllabus and a detailed annotated course outline for the instructor, keyed to the manual and the text. The materials developed will be flexible and highly modular allowing their adoption or adaptation at other institutions. (Also cross-referenced as UMIACS-TR-96-57) Center for Automation Research, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Computer and Information Science, Univ. of the District of,
Advances in High Performance Knowledge Representation. James Hendler. Kilian Stoffel. Merwyn Taylor. July 1996.
This report contains two papers describing important new results in the Parka High Performance Knowledge Representation language. We have ported the SIMD Parka knowledge representation system to generic MIMD machines. The system has been recoded in C and supported using runtime optimization packages developed in the High Performance Systems Software Laboratory at the University of Maryland. New ``scanning'' algorithms have been developed for inheritance and recognition inferences. These algorithms have been tested with both random networks and on a recoding of the ontology of the CYC knowledge base as well as on large planning case-bases. Tests show that the new version is significantly faster than the SIMD system, and that it promises to scale well to knowledge bases orders of magnitude larger than CYC. Real world applications are demanding that KR systems provide support for knowledge bases containing millions of assertions. We present Parka-DB, a high-performance reimplementation of the Parka KR language that uses a standard relational DBMS. The integration of a DBMS and the Parka KR language allows us to efficiently support complex queries on extremely large KBs using a single processor, as opposed to our earlier massively parallel system. In addition, the system can make good use of secondary memory, with the whole system needing less than 16MB of RAM to hold a KB of over 2,000,000 assertions. We demonstrate empirically that this reduction in primary storage requires only about 10% overhead in time, and decreases the load time of very large KBs by more than two orders of magnitude. (Also cross-referenced as UMIACS-TR-96-56) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Jeffrey K. Hollingsworth. Ethan L. Miller. Using Content-Derived Names for Caching and Software Distribution. August 1996.
Maintaining replicated data in wide area information services such as the World Wide Web is a difficult problem. Ensuring that the correct versions of libraries and images are installed for application programs presents similar challenges. In this paper, we present a simple scheme to facilitate both of these tasks using content-derived names (CDNs). Content-based naming uses digital signatures to compute a name for an object based only on its content. CDNs can be applied to several common problems of modern computer systems. Caching on the World Wide Web is simplified by allowing references to an object by its content rather than just its location. In a similar fashion, applications can request library objects by their content without having to rely on the presence of a file system hierarchy that the application recognizes. Further, applications that require different versions of an object can coexist peacefully on the same machine. While this idea is still in its early stages, we present experimental evidence from a study of World Wide Web objects that indicates that CDNs could reduce network traffic by allowing requests to be satisfied by differently-named duplicates with the same contents. (Also cross-referenced as UMIACS-TR-96-55) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Computer Science Dept., Univ. of Maryland Baltimore County,
A New Deterministic Parallel Sorting Algorithm With an Experimental. David R. Helman. Joseph Ja'Ja'. David A. Bader. August 1996.
We introduce a new deterministic parallel sorting algorithm based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Moreover, unlike previous variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. This algorithm was implemented in Split C, the IBM SP-2-WN, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, the performance compares closely to that of our random sample sort algorithm, which seems to outperform all similar algorithms known to the authors on these platforms. Together, their performance is nearly invariant over the set of input distributions, unlike previous efficient algorithms. However, unlike our randomized sorting algorithm, the performance and memory requirements of our regular sorting algorithm can be deterministically guaranteed. (Also cross-referenced as UMIACS-TR-96-54) University of Maryland Institute for Advanced Computer Studies, Dept. of Electrical Engineering, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland,
A Randomized Parallel Sorting Algorithm with an Experimental Study. David R. Helman. David A. Bader. Joseph Ja'Ja'. August 1996.
Previous achemes for sorting on general-purpose parallel machines have had to choose betwen poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhard. Moeover, unlike precious variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. The algorithm was implemented in SPLIT-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research T3D. We ran our code useing widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, it seems to outperform all similar algorithms known to the authors on these platforms, and its performance is invariant over the set of input distributions unlike previous efficient algorithms. Our results also compare facorably with those reported for the simpler ranking problem posed by the NAS Integer Sorting (IS) Benchmark. (Also cross-referenced as UMIACS-TR-96-53) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Dept. of Electrical Engineering, Univ. of Maryland,
Measuring Organization and Asymmetry in Bihemispheric Topographic Maps. Sergio A. Alvarez. Svetlana Levitan. James A. Reggia. September 1996.
We address the problem of measuring the degree of hemispheric organization and asymmetry of organization in a computational model of a bihemispheric cerebral cortex. A theoretical framework for such measures is developed and used to produce algorithms for measuring the degree of organization, symmetry, and lateralization in topographic map formation. The performance of the resulting measures is tested for several topographic maps obtained by self--organization of an initially random network, and the results are compared with subjective assessments made by humans. It is found that the closest agreement with the human assessments is obtained by using organization measures based on sigmoid--type error averaging. Measures are developed which correct for large constant displacements as well as curving of the hemispheric topographic maps. (Also cross-referenced as UMIACS-TR-96-51) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Multiple Vehicle Detection and Tracking in Hard Real Time. Margrit Betke. Esin Haritaoglu. Larry S. Davis. July 1996.
A vision system has been developed that recognizes and tracks multiple vehicles from sequences of gray-scale images taken from a moving car in hard real-time. Recognition is accomplished by combining the analysis of single image frames with the analysis of the motion information provided by multiple consecutive image frames. In single image frames, cars are recognized by matching deformable gray-scale templates, by detecting image features, such as corners, and by evaluating how these features relate to each other. Cars are also recognized by differencing consecutive image frames and by tracking motion parameters that are typical for cars. The vision system utilizes the hard real-time operating system Maruti which guarantees that the timing constraints on the various processes of the vision system are satisfied. The dynamic creation and termination of tracking processes optimizes the amount of computational resources spent and allows fast detection and tracking of multiple cars. Experimental results demonstrate robust, real-time recognition and tracking over thousands of image frames. (Also cross-referenced as UMIACS-TR-96-52) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bilingual Lexicon Construction Using Large Corpora. Wade Shen. Bonnie J. Dorr. October 1997.
This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of filtered linguistic feedback between term and sentence alignment processes. An implementation of this algorithm, TAG-ALIGN, is evaluated against approaches similar to [Brown et al. 1993] that apply Bayesian techniques for term alignment, and [Gale and Church 1991] a dynamic programming method for aligning sentences. The ultimate goal is to produce large bilingual lexicons with a high degree of accuracy from potentially noisy corpora. (Also cross-referenced as UMIACS-TR-97-50) Institute for Advanced Computer Studies, Department of Computer Science,
Ben Shneiderman. July 1996.
The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. A useful starting point for designing advanced graphical user interfaces is the Visual Information-Seeking Mantra: Overview first, zoom and filter, then details-on- demand. But this is only a starting point in trying to understand the rich and varied set of information visualizations that have been proposed in recent years. This paper offers a task by data type taxonomy with seven data types (1-, 2-, 3-dimensional data, temporal and multi-dimensional data, and tree and network data) and seven tasks (overview, zoom, filter, details-on-demand, relate, history, and extract). Also cross-referenced as ISR-TR-96-66 Human Computer Interaction Laboratory, Institute for Systems Research, Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant. July 1996.
1996 HCIL Video Reports. Elastic Windows for Rapid Multiple Window Management Life-Lines: Visualizing Personal Histories Designing Interfaces for Youth Services Information Management Query Previews in Networked Information Systems the Case of EOSDIS Baltimore Learning Communities Table of Contents of the 1995 HCIL Video Reports Table of Contents of the 1994 HCIL Video Reports Visual Information Seeking using the FilmFinder (Extract from the HCIL1994 Video Report Human Computer Interaction Laboratory, Institute for Systems Research, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
BFGS with Update Skipping and Varying Memory. Tamara Gibson. Dianne P. O'Leary. Larry Nazareth. July 1996.
We give conditions under which limited-memory quasi-Newton methods with exact line searches will terminate in $n$ steps when minimizing $n$-dimensional quadratic functions. We show that although all Broyden family methods terminate in $n$ steps in their full-memory versions, only BFGS does so with limited-memory. Additionally, we show that full-memory Broyden family methods with exact line searches terminate in at most $n+p$ steps when $p$ matrix updates are skipped. We introduce new limited-memory BFGS variants and test them on nonquadratic minimization problems. (Also cross-referenced as UMIACS-TR-96-49) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Approximation Algorithms for Connected Dominating Sets. Sudipto Guha. Samir Khuller. June 1996.
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in the dominating set. We focus on the question of finding a {\em connected dominating set} of minimum size, where the graph induced by vertices in the dominating set is required to be {\em connected} as well. This problem arises in network testing, as well as in wireless communication. Two polynomial time algorithms that achieve approximation factors of $O(H(\Delta))$ are presented, where $\Delta$ is the maximum degree, and $H$ is the harmonic function. This question also arises in relation to the traveling tourist problem, where one is looking for the shortest tour such that each vertex is either visited, or has at least one of its neighbors visited. We study a generalization of the problem when the vertices have weights, and give an algorithm which achieves a performance ratio of $3 \ln n$. We also consider the more general problem of finding a connected dominating set of a specified subset of vertices and provide an $O(H(\Delta))$ approximation factor. To prove the bound we also develop an optimal approximation algorithm for the unit node weighted Steiner tree problem. (Also cross-referenced as UMIACS-TR-96-47) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Network-Aware Mobile Programs. M. Ranganathan. Anurag Acharya. Shamik D. Sharma. Joel Saltz. June 1996.
In this paper, we investigate network-aware mobile programs, programs that can use mobility as a tool to adapt to variations in network characteristics. We present infrastructural support for mobility and network monitoring and show how adaptalk, a Java-based mobile Internet chat application can take advantage of this support to dynamically place the chat server so as to minimize response time. Our conclusion was that on-line network monitoring and adaptive placement of shared data-structures can significantly improve performance of distributed applications on the Internet. (Also cross-referenced as UMIACS-TR-96-46) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Carry-Over Round Robin: A Simple Cell Scheduling Mechaniasm for ATM. Debanjan Saha. Sarit Mukherjee. Satish K. Tripathi. June 1996.
We propose a simple cell scheduling mechanism for ATM networks. The proposed mechanism, named Carry-Over Round Robin (CORR), is an extension of weighted round robin scheduling. We show that albeit its simplicity, CORR achieves tight bounds on end-to-end delay and near perfect fairness. Using a variety of video traffic traces we show that CORR often outperforms some of the more complex scheduling disciplines such as Packet-by-Packet Generalized Processor Sharing (PGPS). (Also cross-referenced as UMIACS-TR-96-45) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Multirate Scheduling of VBR Video Traffic in ATM Networks. Debanjan Saha. Sarit Mukherjee. Satish K. Tripathi. June 1996.
(Also cross-referenced as UMIACS-TR-96-44) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, IBM T.J. Watson Research Center, Yorktown Heights, NY, Dept. of Computer Science and Engineering, Univ. of Nebraska,
Active Logic and Heim's Rules for Updating Discourse Context. John Gurney. Don Perlis. Khemdut Purang. June 1996.
Discourse unfolds in time, giving rise to a cascade of belief changes in the listener. Yet this temporal evolution of discourse and belief is typically ignored in theoretical treatments of discourse. It has been claimed (see Soames~\cite{soames:presuppositions}) that Heim's~\cite{heim:projection_problem} theory of discourse context accounts for non-implicative discourse updating. We will present a new non-implicative discourse that cannot be accounted for with Heim's use of global or local accommodation and which appears to require attention to \emph{evolution} of discourse. We use this example to motivate remaking Heim's update function, aimed toward a unified approach to discourse---one in which Heim's rules for discourse updating can account for more of the problem cases for the theory of discourse context. These rules and the revised update function can then serve as principles that constrain the building of representations for discourse context (such as the Discourse Representation Structures, of Discourse Representation Theory, ~\cite{kamp:reyle}). We propose \emph{active logic} as a convenient tool for executing the required inferences (as called for by our revised version of Heim's update function) as the discourse evolves through time. (Also cross-referenced as UMIACS-TR-96-43) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Active Logic Applied to Cancellation of Gricean Implicture. Khemdut Purang. Don Perlis. John Gurney. June 1996.
Dialog proceeds over time, during which inferred beliefs come and go in the listener. Yet this temporal aspect of dialog and belief is typically ignored in theoretical treatments of dialog. Using a simple example of a dialog with an implicature that arises partway through and then is later retracted, we discuss how Gricean maxims and nonmonotonicity may relate to each other and to a computational treatment of implicature. In effect we seek to track reasoning along Gricean lines over time. We present our own computational approach to this, giving an implementation in the formalism of active logics. (Also cross-referenced as UMIACS-TR-96-42) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Conversational Adequacy: Mistakes are the Essence. Don Perlis. Khemdut Purang. June 1996.
We argue that meta-dialog and meta-reasoning, far from being of only occasional use, are the very essence of conversation and communication between agents. We give four paradigm examples of massive use of meta-dialog where only limited base dialog may be present, and use these to bolster our claim of centrality for meta-dialog. We further illustrate this with related work in active logics. We argue moreover that there may be a core set of meta-dialog principles that is in some sense complete. If we are right, then implementing such a set would be of considerable interest. We give examples of existing computer programs that converse inadequately according to our guidelines. (Also cross-referenced as UMIACS-TR-96-41) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Loading Time Scheduling Problem. Randeep Bhatia. Samir Khuller. Joseph (Seffi) Naor. June 1996.
In this paper we study precedence constrained scheduling problems, where the tasks can only be executed on a specified subset of the machines. Each machine has a loading time that is incurred only for the first task that is scheduled on the machine in a particular run. This basic scheduling problem arises in the context of machining on numerically controlled machines, query optimization in databases, and in other artificial intelligence applications. We give the first non-trivial approximation algorithm for this problem. We also prove non-trivial lower bounds on best possible approximation ratios for these problems. These improve on the non-approximability results that are implied by the non-approximability results for the shortests common supersequence problem. We use the same algorithmic technique to obtain approximation algorithms for a problem arising in the context of code generation for parallel machines, and for the weighted shortest common supersequence problem. Dept. of Computer Science, Univ. of Maryland,
Fault Tolerant K-Center Problems. Samir Khuller. Robert Pless. Yoram J. Sussmann. June 1996.
The basic $K$-center problem is a fundamental facility location problem, where we are asked to locate $K$ facilities in a graph, and to assign vertices to facilities, so as to minimize the maximum distance from a vertex to the facility to which it is assigned. This problem is known to be NP-hard, and several optimal approximation algorithms that achieve a factor of $2$ have been developed for it. We focus our attention on a generalization of this problem, where each vertex is required to have a set of $\alpha$ ($\alpha \le K$) centers close to it. In particular, we study two different versions of this problem. In the first version, each vertex is required to have at least $\alpha$ centers close to it. In the second version, each vertex that {\em does not have a center placed on it} is required to have at least $\alpha$ centers close to it. For both these versions we are able to provide polynomial time approximation algorithms that achieve constant approximation factors for {\em any} $\alpha$. For the first version we give an algorithm that achieves an approximation factor of $3$ for any $\alpha$, and achieves an approximation factor of $2$ for $\alpha < 4$. For the second version, we provide algorithms with approximation factors of $2$ for any $\alpha$. The best possible approximation factor for even the basic $K$-center problem is 2. In addition, we give a polynomial time approximation algorithm for a generalization of the $K$-supplier problem where a subset of at most $K$ supplier nodes must be selected as centers so that every demand node has at least $\alpha$ centers close to it. We also provide polynomial time approximation algorithms for all the above problems for generalizations when cost and weight functions are defined on the set of vertices. (Also cross-referenced as UMIACS-TR-96-40) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Capacitated K-Center Problem. Samir Khuller. Yoram J. Sussmann. June 1996.
The capacitated $K$-center problem is a fundamental facility location problem, where we are asked to locate $K$ facilities in a graph, and to assign vertices to facilities, so as to minimize the maximum distance from a vertex to the facility to which it is assigned. Moreover, each facility may be assigned at most $L$ vertices. This problem is known to be NP-hard. We give polynomial time approximation algorithms for two different versions of this problem that achieve approximation factors of 5 and 6. We also study some generalizations of this problem. (Also cross-referenced as UMIACS-TR-96-39) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adaptive Cost Estimation for Client-Server based Heterogeneous Database. Zhaohui Yao. Chungmin Melvin Chen. Nick Roussopoulos. May 1996.
In this paper, we propose a new method for estimating query cost in client-server based heterogeneous database management system. The cost estimation parameters are adjusted by an Adaptive Cost Estimation (ACE) module which uses query execution feedback yielding more and more accurate cost estimates. The most important features of ACE are its detailed cost model which accounts for all costs incurred, its rapid convergence to the actual parameter values, and its low overhead which permits continuous adaptation during the run time of the system. ACE has been implemented and tested with Oracle 6, Oracle 7, Ingres, and ADMS. Extensive experiments performed on these systems show that the ACE's time estimates are within 20% of the real wall-clock time for more than 92% of the queries. This percentage surpasses 98% for queries over 20 seconds. (Also cross-referenced as UMIACS-TR-96-37) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Sandeep Gupta. John S. Baras. Stephen Kelley. Nick Roussopoulos. A case for in-kernel data streaming over the file subsystem. June 1996.
(Also cross-referenced as UMIACS-TR-96-36) University of Maryland Institute for Advanced Computer Studies, Institute of Systems Research, Dept. of Computer Science, Univ. of Maryland,
Signal Stability based Adaptive Routing (SSA) for Ad-Hoc Mobile Networks. Rohit Dube. Cynthia D. Rais. Kuang-Yeh Wang. Satish K. Tripathi. August 1996.
Unlike static networks, ad-hoc networks have no spatial hierarchy and suffer from frequent link failures which prevent mobile hosts from using traditional routing schemes. Under these conditions, mobile hosts must find routes to destinations without the use of designated routers and also must dynamically adapt the routes to the current link conditions. This paper proposes a distributed adaptive routing protocol for finding and maintaining stable routes based on signal strength and location stability in an ad-hoc network and presents an architecture for its implementation. (Also cross-referenced as UMIACS-TR-96-34) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Scrambling Query Plans to Cope With Unexpected Delays. Laurent Amsaleg. Michael J. Franklin. A. Tomasic. T. Urhan.. May 1996.
Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query plan modification techniques that we call query plan scrambling. We present an algorithm which modifies execution plans on-the-fly in response to unexpected delays in data access. The algorithm both reschedules operators and introduces new operators into the plan. We present simulation results that show how our technique effectively hides delays in receiving the initial requested tuples from remote data sources. (Also cross-referenced as UMIACS-TR-96-35) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Diane L. Alonso. Kent L. Norman. May 1996.
Apparency of Contingencies in Single Panel Menus. What we see is not always what we get. This is the problem when the underlying structure of an interface is hidden from the user's view. Users high in Spatial Visualization Ability (SVA), are quick to learn the contingencies of these relationships and are not hindered by this problem. Low SVA users, however, have difficulty visualizing these contingencies and often get lost. We examined data for 97 undergraduate students to determine whether revealing hidden contingencies though visual cues would facilitate Low SVA users, enabling them to approach the level of performance of High SVA users on a computerized path finding task. It was found that increasing interface apparency does seem to benefit all users, but particularly those with Low SVA. (Also cross-referenced as CAR-TR-824) Human Computer Interaction Laboratory, Center for Automation Research, Department of Psychology, Department of Computer Science, Univ. of Maryland,
Douglas W. Oard. Gary Marchionini. May 1996.
A Conceptual Framework for Text Filtering Process. This report develops a conceptual framework for text filtering practice and research, and reviews present practice in the field. Text filtering is an information seeking process in which documents are selected from a dynamic text stream to satisfy a relatively stable and specific information need. A model of the information seeking process is introduced and specialized to define information filtering. The historical development of text filtering is then reviewed and case studies of recent work are used to highlight important design characteristics of modern text filtering systems. Specific techniques drawn from information retrieval, user modeling, machine learning and other related fields are described, and the report concludes with observations on the present state of the art and implications for future research on text filtering. (Also cross-referenced as CAR-TR-830) (Also cross-referenced as EE TR-96-25) (Also cross-referenced as CLIS TR-96-02) Electrical Engineering Department, Digital Library Research Group, Human Computer Interaction Laboratory, Center for Automation Research, Medical Informatics and Computational Intelligence Laboratory, College of Library and Information Services, Dept. of Computer Science, Univ. of Maryland,
Efficient Refreshment of Data Warehouse Views. Lars Baekgaard. Nick Roussopoulos. May 1996.
A data warehouse is a view on a set of distributed and possible loosely coupled source databases. For efficiency reasons a warehouse should be maintained as a materialized view. Therefore, efficient incremental algorithms must be used to periodically refresh the data warehouse. It is possible and desirable to separate the process of warehouse refreshment from the process of warehouse use. In this paper we describe and compare view refreshment algorithms that are based on different combinations of materialized views, partially materialized views, and pointers. Our contribution is twofold. First, our algorithms and data structures are designed to minimize network communication and interactions between the warehouse and the source databases. The minimal set of data that is necessary for both warehouse refreshment and warehouse use is stored on the warehouse. Second, we describe the results of an experiment comparing these methods with respect to storage overhead and I/O. Briefly, the experiment show that algorithms based on a combination of partially materialized views and pointers outperforms algorithms based on materialized views. (Also cross-referenced as UMIACS-TR-96-33) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Gary Marchionini. Catherine Plaisant. Anita Komlodi. February 1996.
UserÕs Needs Assessment for the Library of CongressÕ National Digital Library. Understanding and assessing user needs is the first step in interface design, and this report is one of the first milestones in the overall design effort. This assessment provides an informed basis for the interface design and evaluation to be done in the months to come. It was prepared under the LibraryÕs contract with the Human-Computer Interaction Laboratory (HCIL) at the University of MarylandÕs to work together to design an interface for the LibraryÕs National Digital Library (NDL) Program. In order to determine user needs, HCIL conducted a survey of nine reading rooms with special emphasis on the Special Collections from which the content of the NDL will be drawn. HCIL also used questionnaires to reach remote audiences who may typify NDL users accessing the Library via the Internet. They also analyzed many of the documents available in the Reading Rooms, such as finding aids, other handouts, and user studies. Human Computer Interaction Laboratory, College of Library and Information Services, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Rohit Mahajan. Ben Shneiderman. May 1996.
Visual & Textual Consistency Checking Tools for Graphical User Interfaces. Designing a user interface with a consistent visual design and textual properties with current generation GUI development tools is cumbersome. SHERLOCK, a family of consistency checking tools, has been designed to evaluate visual design and textual pro perties of interface, make the GUI evaluation process less arduous, and aid usability testing. SHERLOCK includes a dialog box summary table to pro vide a compact overview of visual properties of hundreds of dialog boxes of the interface. Terminology specific tools, like Interface Concordance, Terminology Baskets and Interface Speller have been developed. Button specific tools including Button Conco rdance and Button Layout Table have been created to detect variant capitalization, distinct typefaces, distinct colors, variant button sizes and inconsistent button placements. This paper describes the design, software architecture, and the use of SHERLOC K. An experiment with 60 subjects to study the effects of inconsistent interface terminology on user's performance showed 10-25% speedup for consistent interfaces. SHERLOCK was tested with four commercial prototypes; the corresponding outputs, analysis a nd feedback from designers of these applications is presented. (Also cross-referenced as CAR-TR-828) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Stephan Greene. May 1996.
Process Change From User Requirements Elicitation: A Case Study. The Maryland Department of Juvenile Justice (DJJ) is seeking a new information system to replace its legacy system for youth case management. The major goal of the new information system is to improve the process of juvenile case management, and thus deliver more effective services to youths, by better facilitating the tracking of case information and the production and handling of case- related documents. The primary challenge in designing the new system is to integrate optimally the appropriate components of existing processes, information, and documents. Our approach has shown that fostering user discussion and review of existing documents is extremely valuable in defining existing processes and information requirements, and effectively highlights areas where valuable process changes can be made and what system features are needed to support them. Subsequently linking user requirements for documents with innovative graphic user interface techniques can integrate diverse information for users and can affect additional positive changes to organizational processes. (Also cross-referenced as CAR-TR-827) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Anne Rose. Jason Ellis. Catherine Plaisant. Stephan Greene. May 1996.
Life cycle of user interface techniques: The DJJ information system design. To take advantage of todayÕs technology, many organizations are migrating from their legacy systems. With help from the Human-Computer Interaction Laboratory (HCIL) and Cognetics Corporation, the Maryland Department of Juvenile Justice (DJJ) is currently undergoing an effort to redesign their information system to take advantage of graphical user interfaces. As a research lab, HCIL identifies interesting research problems and then prototypes solutions. As a project matures, the exploratory prototypes are adapted to suit the end product requirements. This case study describes the life cycle of three DJJ prototypes: (1) LifeLines, which uses time lines to display an overview of a youth in one screen, (2) the DJJ Navigator, which helps manage individual workloads by displaying different user views, and (3) the ProgramFinder, a tool for selecting the best program for a youth. (Also cross-referenced as CAR-TR-826) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Exploiting Monotone Convergence Functions in Parallel Programs. William Pugh. Evan Rosser. Tatiana Shpeisman. October 1996.
Scientific codes which use iterative methods are often difficult to parallelize well. Such codes usually contain \code{while} loops which iterate until they converge upon the solution. Problems arise since the number of iterations cannot be determined at compile time, and tests for termination usually require a global reduction and an associated barrier. We present a method which allows us avoid performing global barriers and exploit pipelined parallelism when processors can detect non-convergence from local information. (Also cross-referenced as UMIACS-TR-96-31.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ben Shneiderman. April 29, 1996.
Designing Information-Abundant Websites. The deluge of web pages has generated dystopian commentaries on the tragedy of the flood as well as utopian visions of harnessing the same flood for constructive purposes. Within this ocean of information there are also lifeboat web pages with design principles, but often the style parallels the early user interface writings in the 1970s. The well-intentioned Noahs who write from personal experience as website designers, often draw their wisdom from specific projects, making their advice incomplete or lacking in generalizability. Their experience is valuable but the paucity of empirical data to validate or sharpen insight means that some guidelines are misleading. As scientific evidence accumulates, foundational cognitive and perceptual theories will structure the discussion and guide designers in novel situations. (Also cross-referenced as CAR-TR-824) (Also cross-referenced as ISR-TR-96-40) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Interoperability of Data Parallel Runtime Libraries with Meta-Chaos. Guy Edjlali. Alan Sussman. Joel Saltz. May 1996.
This paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. The ability to use multiple libraries is required in many application areas, such as multidisciplinary complex physical simulations and remote sensing image database applications. An application can consist of one program or multiple programs that use different libraries to parallelize operations on distributed data structures. The framework is embodied in a runtime library called Meta-Chaos that has been used to exchange data between data parallel programs written using High Performance Fortran, the Chaos and Multiblock Parti libraries developed at Maryland for handling various types of unstructured problems, and the runtime library for pC++, a data parallel version of C++ from Indiana University. Experimental results show that Meta-Chaos is able to move data between libraries efficiently, and that Meta-Chaos provides effective support for complex applications. (Also cross-referenced as UMIACS-TR-96-30) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Exploiting the Temporal Structure of MPEG Video for the Reduction of. Marwan Krunz. Satish K. Tripathi. May 1996.
We propose a new bandwidth allocation scheme for VBR video traffic in ATM networks. The scheme is tailored to MPEG-coded video sources that require stringent and deterministic quality-of-service guarantees. By exploiting the temporal structure of MPEG sources, we show that our scheme results in an effective bandwidth which, in most cases, is less than the source peak rate. The reduction in the bandwidth requirement is achieved without sacrificing any perceived QoS. Efficient procedures are provided for the computation of the effective bandwidth under heterogeneous MPEG sources. The effective bandwidth strongly depends on the arrangement of the multiplexed streams which is a measure of the degree of synchronization between the GOP patterns of different streams. Assuming that all possible arrangements are equi-probable, we derive an expression for the asymptotic tail distribution of the effective bandwidth. From the tail distribution, we compute several performance measures for the call blocking probability when the allocation is made based on the effective bandwidth. In the case of homogeneous sources, we give a closed-form expression for the `best' arrangement that results in the `optimal' effective bandwidth. Numerical examples based on real MPEG traces are used to demonstrate the advantages of our scheme. (Also cross-referenced as UMIACS-TR-96-29) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On Hybrid Synthesis for Hierarchical Structured Petri Nets. Hong Liu. Jun-Cheol Park. Raymond E. Miller. April 1996.
We propose a hybrid method for synthesis of hierarchical structured Petri nets. In a top-down manner, we decompose a system into a set of subsystems at each level of abstraction, each of these is specified as a blackbox Petri net that has multiple inputs and outputs. We stipulate that each subsystem satisfies the following I/O constraints: (1) At any instance of time, at most one of the inputs can be activated; and (2) If one input is activated, then the subsystem must consume the input and produce exactly one output within a finite length of time. We give a stepwise refinement procedure which starts from the initial high-level abstraction of the system and expands an internal place of a blackbox Petri net into a more detailed subnet at each step. By enforcing the I/O constraints of each subsystem in each intermediate abstraction, our refinement maintains the sequencing of transitions prescribed by the initial abstraction of the system. Next, for the bottom-up synthesis, we present interconnection rules for sequential, parallel, and loop structures and prove that each rule maintains the I/O constraints. Thus, by incorporating these interconnection rules into our refinement formulation, our approach can be regarded as a hybrid Petri net synthesis technique that employs both top-down and bottom-up methods. The major advantage of the method is that the modeling details can be introduced incrementally and naturally, while the important logical properties of the resulting Petri net are guaranteed. Dept. of Computer Science, Univ. of Maryland,
How Embedded Memory in Recurrent Neural Network Architectures Helps. Tsungnan Lin. Bill G. Horne. C. Lee Giles. August 1996.
Learning long-term temporal dependencies with recurrent neural networks can be a difficult problem. It has recently been shown that a class of recurrent neural networks called NARX networks perform much better than conventional recurrent neural networks for learning certain simple long-term dependency problems. The intuitive explanation for this behavior is that the output memories of a NARX network can be manifested as jump-ahead connections in the time-unfolded network. These jump-ahead connections can propagate gradient information more efficiently, thus reducing the sensitivity of the network to long-term dependencies. This work gives empirical justification to our hypothesis that similar improvements in learning long-term dependencies can be achieved with other classes of recurrent neural network architectures simply by increasing the order of the embedded memory. In particular we explore the impact of learning simple long-term dependency problems on three classes of recurrent neural networks architectures: globally recurrent networks, locally recurrent networks, and NARX (output feedback) networks. Comparing the performance of these architectures with different orders of embedded memory on two simple long-term dependences problems shows that all of these classes of networks architectures demonstrate significant improvement on learning long-term dependencies when the orders of embedded memory are increased. These results can be important to a user comfortable to a specific recurrent neural network architecture because simply increasing the embedding memory order will make the architecture more robust to the problem of long-term dependency learning. (Also cross-referenced as UMIACS-TR-96-28) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, NEC Research Institute, Princeton University,
Noisy Time Series Prediction using Symbolic Representation and. Steve Lawrence. Ah Chung Tsoi. C. Lee Giles. April 1996.
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. The symbolic representation aids the extraction of symbolic knowledge from the recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Rules related to well known behavior such as trend following and mean reversal are extracted. Also cross-referenced as UMIACS-TR-96-27 University if Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland,
Hierarchical Task Network Planning: Formalization, Analysis, and. Kutluhan Erol. April 1996.
Planning is a central activity in many areas including robotics, manufacturing, space mission sequencing, and logistics. As the size and complexity of planning problems grow, there is great economic pressure to automate this process in order to reduce the cost of planning effort, and to improve the quality of produced plans. AI planning research has focused on general-purpose planning systems which can process the specifications of an application domain and generate solutions to planning problems in that domain. Unfortunately, there is a big gap between theoretical and application oriented work in AI planning. The theoretical work has been mostly based on state-based planning, which has limited practical applications. The application-oriented work has been based on hierarchical task network (HTN) planning, which lacks a theoretical foundation. As a result, in spite of many years of research, building planning applications remains a formidable task. The goal of this dissertation is to facilitate building reliable and effective planning applications. The methodology includes design of a mathematical framework for HTN planning, analysis of this framework, development of provably correct algorithms based on this analysis, and the implementation of these algorithms for further evaluation and exploration. The representation, analyses, and algorithms described in this thesis will make it easier to apply HTN planning techniques effectively and correctly to planning applications. The precise and mathematical nature of the descriptions will also help teaching about HTN planning, will clarify misconceptions in the literature, and will stimulate further research. (Also cross-referenced as UMIACS-TR-96-26) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Overcomimg Instability in Computing the Fundamental Matrix. Daniel P. Heyman. Dianne P. O'Leary. April 1996.
We present an algorithm for solving linear systems involving the probability or rate matrix for a Markov chain. It is based on a UL factorization but works only with a submatrix of the factor U. We demonstrate its utility on Erlang-B models as well as more complicated models of a telephone multiplexing system. (Also cross-referenced as UMIACS-TR-96-24) Dept. of Computer Science, Univ. of Maryland, University of Maryland Insititute for Advanced Computer Studies,
Catherine Plaisant. Anne Rose. March 1996.
Exploring LifeLines to Visualize Patient Records. LifeLines provide a general visualization environment for personal histories. We explored its use for medical patient records. A one screen overview of the record using timelines provides direct access to the data. Problems, hospitalization and medications can be represented as horizontal lines, while icons represent discrete events such as physician consultations (and progress notes) or tests. Line color and thickness can illustrate relationships or significance. Techniques are described to display large records. Rescaling tools and filters allow users to focus on part of the information, revealing more details. Computerized medical records pose tremendous problems to system developers. Infrastructure and privacy issues need to be resolved before physicians can even start using the records. Non-intrusive hardware is required for physicians to do their work (i.e. interview patients) away from their desk and cumbersome workstations. But all the efforts to solve those problems will only succeed if appropriate attention is also given to the user interface design [1][8]. Long lists to scroll, clumsy search, endless menus and lengthy dialogs will lead to user rejection. But techniques are being developed to summarize, filter and present large amount of information, leading us to believe that rapid access to needed data is possible with careful design. While more attention is now put on developing standards for gathering medical records we found that very little effort had been made to design appropriate visualization and navigation techniques to present and explore personal history records. An intuitive approach to visualizing histories is to use graphical time series. The consistent, linear time scale allows comparisons and relations between the quantities displayed. Data can be graphed on the timeline to show time series of quantitative data. Highly interactive interfaces turn the display into a meaningfully structured menu with direct access to the data needed to review a problem or conduct the diagnosis. Also cross-referenced as CAR-TR-819 Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Communication and Organization in Software Development: An Empirical Study. April 1996.
Carolyn B. Seaman. Victor R. Basili. The empirical study described in this paper addresses the issue of communication among members of a software development organization. The independent variables are various attributes of organizational structure. The dependent variable is the effort spent on sharing information which is required by the software development process in use. The research questions upon which the study is based ask whether or not these attributes of organizational structure have an effect on the amount of communication effort expended. In addition, there are a number of blocking variables which have been identified. These are used to account for factors other than organizational structure which may have an effect on communication effort. The study uses both quantitative and qualitative methods for data collection and analysis. These methods include participant observation, structured interviews, and graphical data presentation. The results of this study indicate that several attributes of organizational structure do affect communication effort, but not in a simple, straightforward way. In particular, the distances between communicators in the reporting structure of the organization, as well as in the physical layout of offices, affects how quickly they can share needed information, especially during meetings. These results provide a better understanding of how organizational structure helps or hinders communication in software development. This work was supported in part IBM's Centre for Advanced Studies, and by NASA grant NSG-5123. (Also cross-referenced as UMIACS-TR-96-23) Dept. of Computer Science, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies,
What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation. Steve Lawrence. C. Lee Giles. Ah Chung Tsoi. April 1996.
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximation, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data. In general, for a) the solution found is worse when the function to be approximated is more complex, for b) oversize networks can result in lower training and generalization error, and for c) the use of committee or ensemble techniques can be more beneficial as the amount of noise in the training data is increased. For the experiments we performed, we do not obtain the optimal solution in any case. We further support the observation that larger networks can produce better training and generalization error using a face recognition example where a network with many more parameters than training points generalizes better than smaller networks. (Also cross-referenced as UMIACS-TR-96-22) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Retrieval Schedules Based on Resource Availibility. K. Selcuk Candan. B. Prabhakaran. V.S. Subrahmanian. March 1996.
A distributed multimedia document presentation involves retrieval of objects from the document server(s) and their presentation at the client system. The presentation of the multimedia objects have to be carried out in accordance with the specification of temporal relationships among the objects. The retrieval of multimedia objects from the document server(s) is influenced by the factors such as: temporal specification of objects presentations, throughput offered by the network service provider, and the buffer resources on the client system. Flexibility in the temporal specification of the multimedia document can help in deriving an object retrieval schedule that can handle variations in the network throughput and buffer resources availability. In this paper, we develop techniques for deriving a flexible object retrieval schedule for a distributed multimedia document presentation. The schedule is based on flexible temporal specification of the multimedia document using the difference constraints approach. We show how the derived retrieval schedule can be validated and modified to ensure that it can work with the offered network throughput and the available buffer resources. (Also cross-referenced as UMIACS-TR-96-21) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Survey of Multilingual Text Retrieval. Douglas W. Oard. Bonnie J. Dorr. April 1996.
This report reviews the present state of the art in selection of texts in one language based on queries in another, a problem we refer to as ``multilingual'' text retrieval. Present applications of multilingual text retrieval systems are limited by the cost and complexity of developing and using the multilingual thesauri on which they are based and by the level of user training that is required to achieve satisfactory search effectiveness. A general model for multilingual text retrieval is used to review the development of the field and to describe modern production and experimental systems. The report concludes with some observations on the present state of the art and an extensive bibliography of the technical literature on multilingual text retrieval. The research reported herein was supported, in part, by Army Research Office contract DAAL03-91-C-0034 through Battelle Corporation, NSF NYI IRI-9357731, Alfred P. Sloan Research Fellow Award BR3336, and a General Research Board Semester Award. (Also cross-referenced as UMIACS-TR-96-19) Electrical Engineering Department, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Writing an Efficient Device Driver for a Multimedia Teleconferencing System. Alexander Sarris. Satish K. Tripathi. March 1996.
Modern high speed networks, such as ATM, can provide the bandwidth and the QoS guarantees to demanding real-time multimedia applications. However, overall performance of a networked multimedia application will greatly depend on the in-host data movement. Analyzing the characteristics and requirements of those applications, we came to several conclusions about the operation of the multimedia devices' drivers. We applied these conclusions in the design and implementation of a device driver for a multimedia teleconferencing system, based on IBM RS/6000 servers, running the AIX 3.2 operating system. Tracing the complete in-host data path, we found that though our device driver minimized the movement of data between the teleconferencing card and user main memory, the UDP/IP stack proved to be a cause of delay in the movement of data between user main memory and the network interface. (Also cross-referenced as UMIACS-TR-96-18) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Fast Nearest Neighbor Search in Medical Image Databases. Flip Korn. Nikolaos Sidiropoulos. Christos Faloutsos. Eliot Siegel. Zenon Protopapas. March 1996.
We examine the problem of finding similar tumor shapes. Starting from a natural similarity function (the so-called `max morpholog- ical distance'), we showed how to lower-bound it and how to search for nearest neighbors in large collections of tumor-like shapes. Specifically, we used state-of-the-art concepts from morphology, namely the `pattern spectrum' of a shape, to map each shape to a point in $n$-dimensional space. Following \cite{Faloutsos94Fast,Jagadish91Retrieval}, we organized the $n$-d points in an R-tree. We showed that the $L_infty$ (= max) norm in the $n$-d space lower-bounds the actual distance. This guarantees no false dismissals for range queries. In addition, we developed a nearest-neighbor algorithm that also guarantees no false dismissals. Finally, we implemented the method, and we tested it against a testbed of realistic tumor shapes, using an established tumor- growth model of Murray Eden \cite{Eden:61}. The experiments showed that our method is up to 27 times faster than straightfor- ward sequential scanning. (Also cross-referenced as UMIACS-TR-96-17) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Scheduling and Allocation in Multiprocessor Systems. Sheng-Tzong Cheng. March 1996.
The problem of allocation has always been one of the fundamental issues of building applications in multiprocessor systems. For real-time applications, the allocation problem should directly address the issues of task and communication scheduling. In this context, the allocation of tasks has to fully utilize the available processors and the scheduling of tasks has to meet the specified timing constraints. Clearly, the execution of tasks under the allocation and schedule has to satisfy the precedence, resources, and synchronization constraints. Traditionally time constraints for real-time tasks have been specified in terms of ready time and deadlines. Many application tasks have relative timing constraints in which the constraints for the execution of a task are defined in terms of the actual execution instances of prior tasks. In this dissertation we consider the allocation and scheduling problem of the periodic tasks with relative timing requirements. We take a time-based scheduling approach to generate a multiprocessor schedule for a set of periodic tasks. A simulated annealing algorithm is developed as the overall search algorithm for a feasible solution. Our results show that the algorithm performs well and finds feasible allocation and scheduling. We also investigate how to exploit the replication technique to increase the schedulability and performance of the systems. In this dissertation, we adopt the computation model in which each task may have more than one copy and a task may start its execution after receiving necessary data from a copy of each of its predecessors. Based on this model, replication techniques are developed to increase the schedulability of the applications in real-time systems and to reduce the execution cost of the applications in non-real-time systems. Dept. of Computer Science, Univ. of Maryland,
Analysis of the Clustering Properties of Hilbert Space-filling Curve. March 1996.
Bongki Moon. H.V. Jagadish. Christos Faloutsos. Joel Saltz. Several schemes for linear mapping of multidimensional space have been proposed for many applications such as access methods for spatio-temporal databases, image compression and so on. In all these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space is preserved in the linear space. It is widely believed that Hilbert space-filling curve achieves the best clustering. In this paper we provide closed-form formulas of the number of clusters required by a given query region of an arbitrary shape (e.g., polygons and polyhedra) for Hilbert space-filling curve. Both the asymptotic solution for a general case and the exact solution for a special case generalize the previous work, and they agree with the empirical results that the number of clusters depends on the hyper-surface area of the query region and not on its hyper-volume. We have also shown that Hilbert curve achieves better clustering than z-curve. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the required disk access behaviors and hence the total access time. (Also cross-referenced as UMIACS-TR-96-20) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Face Recognition: A Hybrid Neural Network Approach. Steve Lawrence. C. Lee Giles. Ah Chung Tsoi. Andrew D. Back. April 1996.
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional network. The Karhunen-Loeve transform performs almost as well (5.3% error versus 3.8%). The multi-layer perceptron performs very poorly (40% error versus 3.8%). The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach on the database considered as the number of images per person in the training database is varied from 1 to 5. With 5 images per person the proposed method and eigenfaces result in 3.8 and 10.5 error respectively. The recognizer provides a measure of confidence in its output and classification error approaches zero when rejecting as few as 10 of the examples. We use a database of 400 images of 40 individuals which contains quite a high degree of variability in expression, pose, and facial details. We analyze computational complexity and discuss how new classes could be added to the trained recognizer. (Also cross-referenced as UMIACS-TR-96-16) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Telicity and English Verb Classes and Alternations: An Overview. Mari Broman Olsen. February 1996.
This document reports on research conducted for the University of Maryland Machine Translation (MT) project. The primary focus of this investigation concerns the lexical aspect feature [+telic] (i.e., having an inherent end, as in the verb win, vs. the verb run) and its relation to the alternations outlined in (Levin, 1993), English verb classes and alternations. This work is based on the assumption that lexical aspect features need not be primitive but may be derived from the same semantic components that potentiate the alternations. Levin's 86 alternations and constructions are divided into five classes with respect to telicity: (i) alternations that indicate telicity (all participating verbs are [+telic] in their basic sense), (ii) alternations and constructions that add telicity (all participating verbs are [+telic] in the relevant construction), (iii) alternations that indicate atelicity (all participating verbs are [;telic] in their basic sense), (iv) alternations and constructions that are irrelevant with respect to (a)telicity (some participating verbs are [+telic] and others [;telic], and their categorization is not systematically affected by the relevant construction), and, for completeness, (v) a small number of alternations that cannot be classified. For alternations indicating telicity_category (i)_I examine the semantic components said to potentiate the alternations, and for alternations and constructions adding telicity_category (ii)_the semantic components added along with telicity. The results suggest a composite semantic basis for telicity, related to the notion of change of state (broadly defined), but not perfectly correlated with it. Other notions are also relevant, such as contextually typical degree, reciprocal action, and dynamicity, another lexical aspect feature. In addition, the study of categories (ii)-(iv) reveals that certain frames may be used for diagnosing atelicity, despite its generally variable behavior. This study also explores the relationship between transitivity and telicity, following suggestions in the work of Hopper and Thompson (1980), Tenny (1987; 1989; 1994), and van Hout (to appear), among others. (Also cross-referenced as UMIACS-TR-96-15) The research reported herein was supported, in part, by Army Research Office contract DAAL03-91-C-0034 through Battelle Corporation, NSF NYI IRI-9357731, Alfred P. Sloan Research Fellow Award BR3336, and a General Research Board Semester Award. University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Moving Faster Than Light. Filippo Lanubile. Giuseppe Visaggio. January 1996.
This paper describes an empirical comparison of several modeling techniques for predicting the quality of software components early in the software life cycle. Using software product measures, we built models that classify components as high-risk, i.e., likely to contain faults, or low-risk, i.e., likely to be free of faults. The modeling techniques evaluated in this study include principal component analysis, discriminant analysis, logistic regression, logical classification models, layered neural networks, and holographic networks. These techniques provide a good coverage of the main problem-solving paradigms: statistical analysis, machine learning, and neural networks. Using the results of independent testing, we determined the absolute worth of the predictive models and compare their performance in terms of misclassification errors, achieved quality, and verification cost. Data came from 27 software systems, developed and tested during three years of project-intensive academic courses. A surprising result is that no model was able to effectively discriminate between components with faults and components without faults. (Also cross-referenced as UMIACS-TR-96-14) Dept. of Computer Science, Univ. of Maryland,
Fuzzy Finite-state Automata Can Be Deterministically Encoded into Recurrent Neural Networks. Christian W. Omlin. Karvel K. Thornber. C. Lee Giles. February 1996.
There has been an increased interest in combining fuzzy systems with neural networks because fuzzy neural systems merge the advantages of both paradigms. On the one hand, parameters in fuzzy systems have clear physical meanings and rule-based and linguistic information can be incorporated into adaptive fuzzy systems in a systematic way. On the other hand, there exist powerful algorithms for training various neural network models. However, most of the proposed combined architectures are only able to process static input-output relationships, i.e. they are not able to process temporal input sequences of arbitrary length. Fuzzy finite-state automata (FFAs) can model dynamical processes whose current state depends on the current input and previous states. Unlike in the case of deterministic finite-state automata (DFAs), FFAs are not in one particular state, rather each state is occupied to some degree defined by a membership function. Based on previous work on encoding DFAs in discrete-time, second-order recurrent neural networks, we propose an algorithm that constructs an augmented recurrent neural network that encodes a FFA and recognizes a given fuzzy regular language with arbitrary accuracy. We then empirically verify the encoding methodology by measuring string recognition performance of recurrent neural networks which encode large randomly generated FFAs. In particular, we examine how the networks' performance varies as a function of synaptic weight strength. (Also cross-referenced as UMIACS-TR-96-12) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
AN INDUCTIVE METHOD FOR DISCOVERING DESIGN PATTERNS FROM. Forrest Shull. Walcelio L. Melo. Victor R. Basili. December 1996.
Object-Oriented Design Patterns (OODPs) have been proposed as a technique to encapsulate design experience and aid in design reuse. However, so far, there is very little empirical evidence about what we can expect from this emergent technology. For instance, to date little research has focused on the development of techniques for discovering workable patterns that can be captured, formalized, packaged, and quantitatively evaluated. Our work is a step in this direction. In this paper we present an inductive method aimed at helping us discover OODPs in existing OO software systems. It encompasses a set of procedures rigorously defined in order to be repeatable and usable by practitioners who are not acquainted with reverse architecting processes. Guidelines are provided and a case study is shown that demonstrates the usefulness of the approach. (Also cross-referenced as UMIACS-TR-96-10) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Collaborative Multimedia Documents: Authoring and Presentation. K.S. Candan. B. Prabhakaran. V.S. Subrahmanian. January 1996.
Multimedia documents are composed of different data types such as video, audio, text and images. Authoring a multimedia document is a creative exercise. Unlike traditional computer supported collaborative work where documents are composed of static objects, multimedia documents have temporal, spatial and quality of service (QoS) requirements that must be supported by any collaborative multimedia platform. In this paper, we show that most requirements (including temporal, spatial, and QoS requirements) for collaborative multimedia systems can be expressed in terms of a highly-structured class of linear constraints called difference constraints that have been well-studied in the operations research literature. As a consequence, well known algorithms for solving difference constraints may be used as a starting point for creating multimedia documents. Based on our difference-constraint based characterization, we develop efficient, incremental algorithms for creating and modifying multimedia documents so as to satisfy the required temporal, spatial and QoS constraints. We further develop methods to identify inconsistent requirements, and show how such inconsistencies may be removed through constraint relaxation techniques. (Also cross-referenced as UMIACS-TR-96-9) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Collaborative Multimedia Systems: Synthesis of Media Objects. K.S. Candan. V. Rangan. V.S. Subrahmanian. January 1996.
When a group I_1,... ,I_n of individuals wishes to collaboratively construct a complex multimedia document, the first requirement is that they be able to manipulate media-objects created by one another. For instance, if individual I_j wishes to access some media objects present at participant I_k's site, he must be able to; (1) retrieve this object from across the network, (2) ensure that the object is in a form that is compatible with the viewing/editing resources he has available at his node, and (3) ensure that the object has the desired quality (such as image size and resolution). Furthermore, he must be able to achieve these goals at the lowest possible cost. In this paper, we develop a theory of media objects, and present optimal algorithms for collaborative object sharing/synthesis of the sort envisaged above. We then extend the algorithms to incorporate quality constraints (such as image size) as well as distribution across multiple nodes. The theoretical model is validated by an experimental implementation that supports the theoretical results. (Also cross-referenced as UMIACS-TR-96-8) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Extracting Reusable Functions by Program Slicing. Filippo Lanubile. Giuseppe Visaggio. January 1996.
An alternative approach to developing reusable components from scratch is to recover them from existing systems. In this paper, we apply program slicing, introduced by Weiser, to the problem of extracting reusable functions from ill-structured programs. We extend the definition of program slice to a transform slice, one that includes statements which contribute directly or indirectly to transform a set of input variables into a set of output variables. Unlike conventional program slicing, these statements do not include neither the statements necessary to get input data nor the statements which test the binding conditions of the function. Transform slicing presupposes the knowledge that a function is performed in the code and its partial specification, only in terms of input and output data. Using domain knowledge we discuss how to formulate expectations of the functions implemented in the code. In addition to the input/output parameters of the function, the slicing criterion depends on an initial statement which is difficult to obtain for large programs. Using the notions of decomposition slice and concept validation we demonstrate how to produce a set of candidate functions, which are independent of line numbers but must be evaluated with respect to the expected behavior. Although human interaction is required, the limited size of candidate functions makes this task easier than looking for the last function instruction in the original source code. (Also cross-referenced as UMIACS-TR-96-13) Dept. of Computer Science, Univ. of Maryland,
Qualitative Analysis for Maintenance Process Assessment. Lionel Briand. Yong-Mi Kim. Walcelio L. Melo. Carolyn B. Seaman. Victor R. Basili. January 1996.
In order to improve software maintenance processes, we first need to be able to characterize and assess them. These tasks must be performed in depth and with objectivity since the problems are complex. One approach is to set up a measurement-based software process improvement program specifically aimed at maintenance. However, establishing a measurement program requires that one understands the problems to be addressed by the measurement program and is able to characterize the maintenance environment and processes in order to collect suitable and cost-effective data. Also, enacting such a program and getting usable data sets takes time. A short term substitute is therefore needed. We propose in this paper a characterization process aimed specifically at maintenance and based on a general qualitative analysis methodology. This process is rigorously defined in order to be repeatable and usable by people who are not acquainted with such analysis procedures. A basic feature of our approach is that actual implemented software changes are analyzed in order to understand the flaws in the maintenance process. Guidelines are provided and a case study is shown that demonstrates the usefulness of the approach. (Also cross-referenced as UMIACS-TR-96-7) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Load Balancing for Parallel Loops in Workstation Clusters. Tae-Hyung Kim. James M. Purtilo. January 1996.
Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes are not adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck for the relatively small number of processors in workstation clusters because of order-of-magnitude differences in communications overheads. Moreover, improvements of basic loop scheduling methods have not dealt effectively with irregularly distributed workloads in parallel loops, which commonly occur in applications for workstation clusters. In this paper, we present a new decentralized balancing method for parallel loops on workstation clusters. (Also cross-referenced as UMIACS-TR-96-6) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Scalability Analysis of Declustering Methods for Cartesian Product. Bongki Moon. Joel Saltz. April 1996.
Efficient storage and retrieval of multi-attribute datasets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multi-attribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files over multiple disks to obtain high performance for disk accesses. Though the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper we derive formulas describing the scalability of two popular declustering methods Disk Modulo and Fieldwise Xor for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods and are corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions. (Also cross-referenced as UMIACS-TR-96-5) Department of Computer Science, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies,
Study of Scalable Declustering Algorithms for Parallel Grid Files. Bongki Moon. Anurag Acharya. Joel Saltz. February 1996.
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called {\em minimax} which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution. (Also cross-referenced as UMIACS-TR-96-4) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Use of Behavior Hierarchies for Controlling a Vision-Based. Oliver Seeliger. January 1986.
In this thesis I describe the use of behavior hierarchies based on ``merging'' two models of multi-layer architecture --- the supervenience model of Spector and the subsumption model of Brooks. The behavior hierarchy approach allows us to use the robustness of reactivity in behavior design. It also encourages the design of modular behaviors that can be reused or more importantly recalibrated in different situations. I argue that behavior hierarchies extend our ability to design and program effective solutions that combine reactive and goal-driven components, but do not require any explicit planning. This work is used for an implemented system in which the underwater robot SCAMP developed at the Space Systems Laboratory at the University of Maryland performs vision-based behaviors to relieve a human operator of certain tasks. (Also cross-referenced as UMIACS-TR-96-3) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Iterative methods for solving Ax = b. Jane K. Cullum. January 1996.
We study the convergence of GMRES/FOM and QMR/BiCG methods for solving nonsymmetric Az = b. We prove that given the results of a BiCG computation on Az = b, we can obtain a matrix B with the same eigenvalues as A and a vector c such that the residual norms generated by a FOM computation on Bz = c are identical to those generated by the BiCG computations. Using a unitary equivalence for each of these methods, we obtain test problems where we can easily vary certain spectral properties of the matrices. We use these test problems to study the effects of nonnormality on the convergence of GMRES and QMR, to study the effects of eigenvalue outliers on the convergence of QMR, and to compare the convergence of restarted GMRES and QMR across a family of normal and nonnormal problems. Our GMRES tests on nonnormal test matrices indicate that nonnormality can have unexpected effects upon the residual norm convergence, giving misleading indications of superior convergence when the error norms for GMRES are not significantly different from those for QMR. Our QMR tests indicate that the convergence of the QMR residual and error norms is infLuenced predominantly by small and large eigenvalue outliers and by the character, real, complex, or nearly real, of the outliers and the other eigenvalues. In our comparison tests QMR outperformed GMRES(10) and GMRES(20) on both the normal and nonnormal test matrices. If you have difficulty viewing the second part of the linked postscript file, open the file: http://www.cs.umd.edu/fs/ftp/pub/papers/papers/3587.figures.ps. This is the second part of the paper in a separate file. (Also cross-referenced as UMIACS-TR-96-2) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Two Computer Systems Paradoxes: Serialize-to-Parallelize,. Rimon Orni. Uzi Vishkin. September 1995.
We present and examine the following Serialize-to-Parallelize Paradoz: suppose a programmer has a parallel algorithm in mind; the programmer must serialize the algorithm, and is actually trained to suppress its parallelism, while writing code; later, however, compilation and runtime techniques are used to reverse the results of this serialization effort and extract as much parallelism as possible. This work actually provides examples where parallel or parallel-style code enables extracting more parallelism than standard serial code. The "arbitrary concurrent-write" convention is useful in parallel algorithms and programs and appears to be not too difficult to implement in hardware for serial machines. Still, typically concurrent-writes to the same memory location in a program are implemented by queuing the write operations, thus requiring time linear in the number of writes. We call this the Queuing Concurrent- Writes Paradoz. Assuming that providing useful, easy-to-program programming paradigms to improve the overall effectiveness of computer systems is of interest, this work is a modest example for applying such software-driven considerations to computer architecture issues. This work may be the first to relate parallel algorithms and parallel programming with the technology of instruction level parallelism. (Also cross-referenced as UMIACS-TR-96-1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Improving NFS Performance over Wireless Links. Rohit Dube. Cynthia D. Rais. Satish K. Tripathi. December 1995.
NFS is a widely used remote file access protocol that has been tuned to perform well on traditional LANs which exhibit low error rates. Users migrating to mobile-hosts would like to continue to use NFS for remote file accesses. However, low bandwidth and high error-rates degrade performance on mobile-hosts using wireless links thus hindering the use of NFS. In this paper, we present two mechanisms to improve NFS performance over wireless links : an aggressive NFS client and link-level retransmissions. Our experiments show that these mechanisms improve throughput by up to 200%, which brings the performance to within 5% of that obtained in zero error conditions. (Also cross-referenced as UMIACS-TR-95-126) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Measuring NFS Performance in Wireless Networks. Cynthia D. Rais. Satish K. Tripathi. December 1995.
Technological trends suggest that soon communication networks will consist of a high speed wired backbone with numerous wireless Local Area Networks. Mobile computing and wireless subnetworks are increasingly in demand. Mobile routing solutions provide wireless LANs with seamless connectivity to backbone wired systems. However, these solutions do not provide acceptable performance. Wireless networks have distinct transmission characteristics which present challenges to achieving efficient performance. Performance over wireless links is limited by high error rates, mobility, and low bandwidth. We have studied the performance of TCP and NFS over a wireless network. The prevalence of these protocols means that mobile hosts will frequently use them when communicating with stationary hosts. Measurements have been collected to determine the response of these protocols in the presence of various error patterns. These measurements show that NFS and TCP performance suffer extreme degradation due to these wireless link characteristics. Unexpectedly, NFS performance is not better than an TCP FTP file transfer. NFS performance over wireless links is limited by large packet sizes, long retransmission timeouts, and slow response to losses. Our goal is to understand the effects of wireless communication on these protocols and improve performance without requiring changes to the current network Infrastructure. (Also cross-referenced as UMIACS-TR-95-125) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A framework for integrating Mobile Hosts within the Internet. Pravin Bhagwat. December 1995.
Host mobility and wireless access are two emerging design considerations that pose challenging problems at all layers of the networking protocol stack. This dissertation investigates their impact on the design of link, network, and transport layer protocols. At the network layer, we have designed and implemented a new routing architecture that allows the current set of Internet standards to support routing to mobile hosts. At the link and transport layers, we have designed mechanisms to improve throughput over error-prone wireless channels. At the network layer, the most crucial problem is that of routing. The existing Internet routing mechanisms cannot route packets to hosts whose points of attachment to the network change over time. Exploiting IP's Loose Source Route option, we have designed and implemented a routing scheme which provides location independent network access to TCP/IP compliant mobile hosts. It also allows mobile hosts equipped with multiple network interfaces to dynamically migrate active network sessions from one network interface to another. The proposed scheme only requires the addition of two new entity types, Mobile Routers and Mobile Access Stations. These entities perform all required mobility-aware functions, such as address translation, user tracking and location management. No modifications to existing host or router software are required. Although MobileIP provides continuous network connectivity to mobile hosts, the effects of host movement and wireless medium characteristics are often visible at the transport layer. We consider the effect of wireless medium characteristics on the performance of Transmission Control Protocol (TCP) sessions. Unlike wired networks, packets transmitted on wireless channels are often subject to burst errors which cause back to back packet losses. We show that TCP's error-recovery mechanisms perform poorly when packets from a TCP session are subject to burst errors. Unlike other approaches which require modification to TCP, our solution requires enhancements only at the wireless link layer, thus making it applicable to other transport protocols as well. We use a Channel State Dependent Packet (CSDP) scheduler which takes wireless channel characteristics into consideration in making packet dispatching decisions. Our results show that the CSDP technique provides improved throughput, better channel utilization, and fairness among multiple TCP streams. (Also cross-referenced as UMIACS-TR-95-124) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Arnoldi versus Nonsymmetric Lanczos Algorithms. Jane K. Cullum. December 1995.
We obtain several results which may be useful in determining the convergence behavior of eigenvalue algorithms based upo n Arnoldi and nonsymmetric Lanczos recursions. We derive a relationship between nonsymmetric Lanczos eigenvalue procedures and Arnoldi eigenvalue procedures. We demonstrate that the Arnoldi recursions preserve a property which characterizes normal matrices, and that if we could determine the appropriate starting vectors, we could mimic the nonsymmetric Lanczos eigenvalue convergence on a general diagonalizable matrix by its convergence on related normal matrices. Using a unitary equivalence for each of these Krylov subspace methods, we define sets of test problems where we can easily vary certain spectral properties of the matrices. We use these and other test problems to examine the behavior of an Arnoldi and of a nonsymmetric Lanczos procedure. (Also cross-referenced as UMIACS-TR-95-123) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Predicting Semantics from Syntactic Cues -- An Evaluation of Levin's English Verb Classes and Alternations. Doug Jones. Lun Li. December 1995.
In this report, we describe our results of carefully studying the first chapter of two published Chinese translations of Noam Chomsky's Syntactic Structures, as translated by Wang and Lu (1966) and Xing et al. (1979). Our first step was to create a word-by-word alignment, in as much as that was possible. We then examined ways we could use the alignment to generate other resources, such as a dictionary and a procedure for automating the translation process. We then retranslated the text, using the rules we developed. We will present this set of rules and the resulting translation in the body of this report. The appendix contains the published Chinese texts that we worked from, the alignments of these texts, and the small dictionary we produced. (Also cross-referenced as UMIACS-TR-95-122) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Predicting Semantics from Syntactic Cues --. Doug Jones. December 1995.
The relationship between the meaning of verbs and their syntactic patterns has recently been explored in the landmark study of (Levin, 1993). Although the central thesis of this book is that verb semantics and syntactic behavior are predictably related, the large scope of the work makes it difficult to verify. I show that it is possible to guess the semantic class of a verb based on syntactic cues automatically extracted from the example sentences in her book. In particular, it is possible to correctly guess 94.8% of Levin's semantic classes if the parses contain prepositions, negative evidence is included, and word senses are disambiguated. This report includes the syntactic signatures of Levin's 191 semantic classes, in addition to a detailed description of how the syntactic signatures behave according to the different parameters involving negative evidence, prepositions, and disambiguation. (Also cross-referenced as UMIACS-TR-95-121) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Modeling Bit Rate Variations in MPEG Sources. Marwan Krunz. Satish K. Tripathi. December 1995.
In this paper, we propose a traffic model for the characterization of VBR MPEG-coded video streams. This model provides the means to generate synthetic MPEG streams that can be used in performance studies of ATM networks. The model is appropriately fitted to three long empirical video sequences taken from different movies. We use multiple components to model bit rate variations in an MPEG stream. These components have different time scales. Long-term variations in the bit rate are captured at the scene level. Within a scene, the sizes of I frames tend to slightly fluctuate around some average. Hence, we measure the activity of the scene by the average size of I frames in that scene. This average varies from one scene to another, and its randomness is reasonably approximated by a lognormal distribution. For a given scene, the fluctuations of the sizes of I frames about- their mean are modeled as an AR(2) time series. Finally, we show that the sizes of P and B frames can be appropriately fitted by lognormal distributions, with corresponding parameters. Using the compression pattern, the complete frame size sequence is formed by intermixing three subsequences, each of which describes the frame size sequence for a particular frame type. The validity of the model is demonstrated by the similarity between a synthetic stream and an actual trace, in terms of the correlation structure, the marginal distribution, the sample paths, and more importantly, the queueing performance. (Also cross-referenced as UMIACS-TR-95-120) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Impact of Synchronization on the Allocation of Bandwidth for. Marwan Krunz. Satish K. Tripathi. December 1995.
In an MPEG encoder, three types of frames (I, P. and B) are periodically generated according to a pre-specified compression pattern. As a result, an MPEG sequence is periodic in its compression pattern, and this periodicity can be used to reduce the bandwidth requirements of multiplexed MPEG streams. By exploiting the deterministic and periodic nature of the compression pattern, we show that it is possible to provide stringent deterministic guarantees (no cell losses and no queueing delay) to MPEG connections without the need to allocate the peak rates of individual sources. Instead, a stream is allocated its effective bandwidth, which is the aggregate peak rate of the multiplexed streams divided by the number of streams. The aggregate peak rate depends on the arrangement of the multiplexed streams which is a measure of the degree of synchronization among the compression patterns of different streams. It is found that in most cases, the effective bandwidth is smaller than the source peak rate. For a given arrangement, we provide a procedure to compute the effective bandwidth. We also give an expression for the 'best' arrangement that results in the 'optimal' effective bandwidth. Examples of real MPEG sequences are used to show the bandwidth gains that can be achieved through proper scheduling of the starting times of MPEG connections. (Also cross-referenced as UMIACS-TR-95-120) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Minimizing Communication while Preserving Parallelism. Wayne Kelly. William Pugh. December 1995.
To compile programs for message passing architectures and to obtain good performance on NUMA architectures it is necessary to control how computations and data are mapped to processors. Languages such as High-Performance Fortran use data distributions supplied by the programmer and the owner computes rule to specify this. However, the best data and computation decomposition may differ from machine to machine and require substantial expertise to determine. Therefore, automated decomposition is desirable. All existing methods for automated data/computation decomposition share a common failing: they are very sensitive to the original loop structure of the program. While they find a good decomposition for that loop structure, it may be possible to apply transformations (such as loop interchange and distribution) so that a different decomposition gives even better results. We have developed automatic data/computation decomposition methods that are not sensitive to the original program structure. We can model static and dynamic data decompositions as well as computation decompositions that cannot be represented by data decompositions and the owner computes rule. We make use of both parallel loops and doacross/pipelined loops to exploit parallelism. We describe an automated translation of the decomposition problem into a weighted graph that incorporates estimates of both parallelism and communication for various candidate computation decompositions. We solve the resulting search problem exactly in a very short time using a new algorithm that has shown to be able to prune away a majority of the vast search space. We assume that the selection of the computation decomposition is followed by a transformation phase that reorders the iterations to best match the selected computation decomposition. Our graph includes constraints to ensure that a reordering transformation giving the predicted parallelism exists. (Also cross-referenced as UMIACS-TR-95-118) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Network Layer Mobility: an Architecture and Survey. Pravin Bhagwat. Satish K. Tripathi. Charles Perkins. December 1995.
In this paper we explore various network layer concepts that pertain to the design of mobile networking systems. We show that mobility is essentially an {\em address translation} problem and is best resolved at the network layer. We have identified the fundamental services that must be supported at the network layer to carry out the task of address translation. Using these service primitives as building blocks, we propose a network layer architecture which enables smooth integration of mobile end systems within the existing Internet. The architecture is modularized into well-defined logical components. In this paper our objective is not to propose {\em a specific scheme} for supporting mobility, rather it is to highlight and analyze the essential aspects of supporting mobile end-systems, as well as to better understand the trade-off between various design alternatives. (Also cross-referenced as UMIACS-TR-95-117) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Runtime Coupling of Data-parallel Programs. M. Ranganathan. Anurag Acharya. Guy Edjlali. Alan Sussman. Joel Saltz. November 1995.
We consider the problem of efficiently coupling multiple data-parallel programs at runtime. We propose an approach that establishes a mapping between data structures in different data-parallel programs and implements a user specified consistency model. Mappings are established at runtime and new mappings between programs can be added and deleted while the programs are in execution. Mappings, or the identity of the processors involved, do not have to be known at compile-time or even link-time. Programs can be made to interact with different granularities of interaction without requiring any re-coding. A priori knowledge of data movement requirements allows for buffering of data and overlap of computations between coupled applications. Efficient data movement is achieved by pre-computing an optimized schedule. We describe our prototype implementation and evaluate its performance for a set of synthetic benchmarks that examine the variation of performance with coupling parameters. We demonstrate that the cost of the added flexibility gained by our coupling method is not prohibitive when compared with a monolithic code that does the same computation. (Also cross-referenced as UMIACS-TR-95-116) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Gagan Agrawal. Anurag Acharya. Joel Saltz. An Interprocedural Framework for Placement of Asychronous I/O. November 1995.
Overlapping memory accesses with computations is a standard technique for improving performance on modern architectures, which have deep memory hierarchies. In this paper, we present a compiler technique for overlapping accesses to secondary memory (disks) with computation. We have developed an Interprocedural Balanced Code Placement (IBCP) framework, which performs analysis on arbitrary recursive procedures and arbitrary control flow and replaces synchronous I/O operations with a balanced pair of asynchronous operations. We demonstrate how this analysis is useful for applications which perform frequent and large accesses to secondary memory, including applications which snapshot or checkpoint their computations or out-of-core applications. (Also cross-referenced as UMIACS-TR-95-114) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Hassan Fallah-Adl. Joseph Ja'Ja'. Shunlin Liang. Fast Algorithms for Estimating Aerosol Optical Depth and Correcting. November 1995.
Remotely sensed images collected by the satellites are usually contaminated by the effects of the atmospheric particles through absorption and scattering of the radiation from the earth surface. The objective of atmospheric correction is to retrieve the surface reflectance from remotely sensed imagery by removing the atmospheric effects, which is usually performed in two steps. First, the optical characteristics of the atmosphere are estimated and then the remotely sensed imagery is corrected by inversion procedures that derive the surface reflectance. In this paper we introduce an efficient algorithm to estimate the optical characteristics of the Thematic Mapper (TM) imagery and to remove the atmospheric effects from it. Our algorithm introduces a set of techniques to significantly improve the quality of the retrieved images. We pay a particular attention to the computational efficiency of the algorithm, thereby allowing us to correct large TM images quite fast. We also provide a parallel implementation of our algorithm and show its portability and its scalability on several parallel machines. (Also cross-referenced as UMIACS-TR-95-113) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Brian Kettler. Case-based Planning with a High-Performance Parallel Memory. November 1995.
Case-based planning (CBP) systems, like other case-based reasoning systems, can take advantage of previous planning experience by reusing stored cases (plans) in similar situations in the future. Advantages of CBP include speedup over planning from scratch and the ability to function with limited causal domain knowledge. ``Traditional'' CBP systems with the latter advantage typically cannot produce plans from scratch because they lack the more powerful adaptation mechanisms of generative planning systems. These ``reuse-only'' CBP systems rely on retrieving a plan from the casebase that is close to a solution plan. This requires large casebases with good coverage of the problem space and the ability to encode and match cases at fine levels of detail. Many such CBP systems, however, have fallen short of these requirements. They support only small, pre-indexed casebases. Pre-indexing constrains retrieval, as does the use of less expressive feature-based case representation schemes. The encoding and matching of detailed structural relationships in cases is not possible in such systems. These systems often adapt a single plan to the target problem using methods that are ad hoc or heuristic. CAPER is a novel, domain-independent, case-based planning system with improvements over traditional reuse-only CBP systems from its use of techniques that exploit a high-performance parallel memory of cases. CAPER takes a memory-intensive approach by making frequent use of memory during all phases of planning and by using large casebases, which can be automatically seeded. Because the parallel retrieval mechanisms scale to real-world sized casebases of thousands of plans, memory does not have to be pre-indexed and thus retrieval is more flexible. Detailed queries can be used to match cases, which are stored using an expressive, graph-structured case representation scheme. Plan adaptation in CAPER borrows techniques from generative planning, such as the use of plan validations, which capture dependencies in a plan, and plan composition. These techniques are incorporated into a reuse-only CBP framework for a more principled approach to adaptation than in many reuse-only CBP systems. CAPER can also use its flexible retrieval mechanisms and case representations to retrieve patch or substitute plans from memory. (Also cross-referenced as UMIACS-TR-95-112) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adaptive Use of Iterative Methods in Interior Point Methods for Linear. Weichung Wang. Dianne P. O'Leary. November 1995.
In this work we devise efficient algorithms for finding the search directions for interior point methods applied to linear programming problems. There are two innovations. The first is the use of updating of preconditioners computed for previous barrier parameters. The second is an adaptive automated procedure for determining whether to use a direct or iterative solver, whether to reinitialize or update the preconditioner, and how many updates to apply. These decisions are based on predictions of the cost of using the different solvers to determine the next search direction, given costs in determining earlier directions. These ideas are tested by applying a modified version of the OB1-R code of Lustig, Marsten, and Shanno to a variety of problems from the NETLIB and other collections. If a direct method is appropriate for the problem, then our procedure chooses it, but when an iterative procedure is helpful, substantial gains in efficiency can be obtained. (Also cross-referenced as UMIACS-TR-95-111) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Perturbation of Eigenvalues of Preconditioned Navier-Stokes Operators. Howard C. Elman. November 1995.
We study the sensitivity of algebraic eigenvalue problems associated with matrices arising from linearization and discretization of the steady-state Navier-Stokes equations. In particular, for several choices of Reconditioners applied to the system of discrete equations, we derive upper bounds on perturbations of eigenvalues as functions of the viscosity and discretization mesh size. The bounds suggest that the sensitivity of the eigenvalues is at worst linear in the inverse of the viscosity and quadratic in the inverse of the mesh size, and that scaling can be used to decrease the sensitivity in some cases. Experimental results supplement these results and confirm the relatively mild dependence on viscosity. They also indicate a dependence on the mesh size of magnitude smaller than the analysis suggests. (Also cross-referenced as UMIACS-TR-95-110) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Semantic Query Optimization for Bottom-Up Evaluation. Parke Godfrey. Jarolsaw Gryz. Jack Minker. November 1995.
Semantic query optimization uses semantic knowledge in databases (represented in the form of integrity constraints) to rewrite queries and logic programs for the purpose of more efficient query evaluation. Much work has been done to develop various techniques for optimization. Most of it, however, is only applicable to top-down query evaluation strategies. Moreover, little attention has been paid to the cost of the optimization itself. In this paper, we address the issue of semantic query optimization for bottom-up query evaluation strategies with an emphasis on overall efficiency. We restrict our attention to a single optimization technique, join elimination. We discuss various factors that influence the cost of semantic optimization, and present two abstract algorithms for different optimization approaches. The first one pre-processes a query statically before it is evaluated; the second approach combines query evaluation with semantic optimization using heuristics to achieve the largest possible savings. (Also cross-referenced as UMIACS-TR-95-109) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Interprocedural Data Flow Based Optimizations for Distributed Memory. Gagan Agrawal. Joel Saltz. November 1995.
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present three new interprocedural optimizations: placement of scatter routines, deletion of data structures and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran~D compilation system as the necessary infrastructure. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes. (Also cross-referenced as UMIACS-TR-95-108) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Conjugate Gradients and Related KMP Algorithms: The Beginnings. Dianne P. O'Leary. November 1995.
In the late 1940's and early 1950's, newly available computing machines generated intense interest in solving ``large'' systems of linear equations. Among the algorithms developed were several related methods, all of which generated bases for Krylov subspaces and used the bases to minimize or orthogonally project a measure of error. These methods include the conjugate gradient algorithm and the Lanczos algorithm. We refer to these algorithms as the KMP family and discuss its origins, emphasizing research themes that continue to have central importance. (Also cross-referenced as UMIACS-TR-95-107) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Review of Software Inspections. Adam A. Porter. Harvey Siy. Lawrence G. Votta Jr.. October 1995.
For two decades, software inspections have proven effective for detecting defects in software. We have reviewed the different ways software inspections are done, created a taxonomy of inspection methods, and examined claims about the cost-effectiveness of different methods. We detect a disturbing pattern in the evaluation of inspection methods. Although there is universal agreement on the effectiveness of software inspection, their economics are uncertain. Our examination of several empirical studies leads us to conclude that the benefits of inspections are often overstated and the costs (especially for large software developments) are understated. Furthermore, some of the most influential studies establishing these costs and benefits are 20 years old now, which leads us to question their relevance to today's software development processes. Extensive work is needed to determine exactly how, why, and when software inspections work, and whether some defect detection techniques might be more cost-effective than others. In this article we ask some questions about measuring effectiveness of software inspections and determining how much they really cost when their effect on the rest of the development process is considered. Finding answers to these questions will enable us to improve the efficiency of software development. (Also cross-referenced as UMIACS-TR-95-104) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Experiments with Digital Video Playback. Richard Gerber. Ladan Gharai. October 1995.
In this paper we describe our experiments on digital video applications, concentrating on the static and dynamic tradeoffs involved in video playback. Our results were extracted from a controlled series 272 tests, which we ran in three stages. In the first stage of 120 tests, we used a simple player-monitor tool to evaluate the effects of various static parameters: compression type, frame size, digitized rate, spatial quality and keyframe distribution. The tests were carried out on two Apple Macintosh platforms: at the lower end a Quadra 950, and at the higher end, a Power PC 7100/80. Our quantitative metrics included average playback rate, as well as the rate's variance over one-second intervals. The first set of experiments unveiled several anomalous latencies. To track them down we ran an additional 120 tests, whose analysis led us to find the locus of the system's bottlenecks. They also let us conclude that a software-only solution was sufficient for good video playback on the systems under observation - provided that the operating system is tuned accordingly. In the next step we attempted to achieve this goal, by implementing our own video playback software and accompanying device-level handlers. Our emphasis was on achieving a controlled, deterministic coordination between the various system components. An additional set of 32 experiments were carried out on our platforms, which showed significant improvements in our quantitative performance measurements, as well as in visual quality. (Also cross-referenced as UMIACS-TR-95-103) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chris North. Ben Shneiderman. Catherine Plaisant. October 1995.
User controlled overviews of an image library: A case study of the Visible. This paper proposes a user interface for remote access of the National Library of MedicineÕs Visible Human digital image library. Users can visualize the library, browse contents, locate data of interest, and retrieve desired images. The interface pr esents a pair of tightly coupled views into the library data. The overview image provides a global view of the overall search space, and the preview image provides details about high resolution images available for retrieval. To explore, the user sweeps the views through the search space and receives smooth, rapid, visual feedback of contents. Desired images are automatically downloaded over the internet from the library. Library contents are indexed by meta-data consisting of automatically generated miniature visuals. The interface software is completely functional and freely available for public use, at: http://www.nlm.nih.gov/. (Also cross-referenced as CAR-TR-798) (Also cross-referenced as ISR-TR-95-99) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
David R. Helman. David A. Bader. Joseph Ja'Ja'. December 1995.
A Parallel Sorting Algorithm With an Experimental Study. Institute for Advanced Computer Studies, and, Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. This algorithm was implemented in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results are consistent with the theoretical analysis and illustrate the efficiency and scalability of our algorithm across different platforms. In fact, it seems to outperform all similar algorithms known to the authors on these platforms, and its performance is invariant over the set of input distributions unlike previous efficient algorithms. Our results also compare favorably with those reported for the simpler ranking problem posed by the NAS Integer Sorting (IS) Benchmark. (Also cross-referenced as UMIACS-TR-95-102) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David A. Bader. David R. Helman. Joseph Ja'Ja'. November 1995.
Practical Parallel Algorithms for Personalized Communication and. Institute for Advanced Computer Studies,, Department of Electrical Engineering,, A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication. As an application, we present an efficient algorithm for stable integer sorting. The algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analysis and illustrate the scalability and efficiency of our algorithms across different platforms. In fact, they seem to outperform all similar algorithms known to the authors on these platforms. (Also cross-referenced as UMIACS-TR-95-101.)
Software Engineering of Distributed Simulation Environments. James Duff. James M. Purtilo. Michael Capps. David Stotts. October 1995.
With the increasing popularity of simulation and virtual environment software, it is necessary to provide software engineering techniques to simulation program designers. In this paper we lay out the requirements that any such techniques will have to meet, then suggest a formalism and an interconnection tool that will allow the interconnection of re-usable simulator components to build distributed simulation software. (Also cross-referenced as UMIACS-TR-95-100) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Dietmar Seipel. Jack Minker. Carolina Ruiz. Model Generation and State Generation for Disjunctive Logic Programs. October 1995.
This paper investigates two fixpoint approaches for minimal model reasoning with disjunctive logic programs DB. The first one, called model generation [4], is based on an operator TI defined on sets of Herbrand interpretations, whose least fixpoint is logically equivalent to the set of minimal Herbrand models of the program. The second approach, called state generation [12], uses a fixpoint operator TS based on hyperresolution. It operates on disjunctive Herbrand states and its least fixpoint is the set of logical consequences of DB, the so--called minimal model state of the program. We establish a useful relationship between hyperresolution by TS and model generation by TI. Then we investigate the problem of continuity of the two operators TS and TI. It is known that the operator TS is continuous [12], and so it reaches its least fixpoint in at most omega steps. On the other hand, the question of whether TI is continuous has been open. We show by a counterexample that TI is not continuous. Nevertheless, we prove that it converges towards its least fixpoint in at most omega steps too, as follows from the relationship that we show exists between hyperresolution and model generation. We define an iterative version of TI that computes the perfect model semantics of stratified disjunctive logic programs. On each stratum of the program, this operator converges in at most omega steps. Model generations for the stable semantics and the partial stable (and so the well--founded semantics) are respectively achieved by using this iterative operator together with the evidential transformation [3] and the 3-S transformation [16]. (Also cross-referenced as UMIACS-TR-95-99) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Efficient Machine-Independent Programming of High-Performance. Chau-Wen Tseng. October 1995.
Parallel computing is regarded by most computer scientists as the most likely approach for significantly improving computing power for scientists and engineers. Advances in programming languages and parallelizing compilers are making parallel computers easier to use by providing a high-level portable programming model that protects software investment. However, experience has shown that simply finding parallelism is not always sufficient for obtaining good performance from today's multiprocessors. The goal of this project is to develop advanced compiler analysis of data and computation decompositions, thread placement, communication, synchronization, and memory system effects needed in order to take advantage of performance-critical elements in modern parallel architectures. Dept. of Computer Science, Univ. of Maryland,
Coherence as an Abstract Type. Pete Keleher. October 1995.
We are currently designing Sparks, a protocol construction library that we hope will allow us to improve the performance of DSM systems to within a few percent of tightly-coupled multiprocessors. Sparks' abstractions will allow us to cleanly and systematically explore the design space of highlevel synchronization operations, rather than proposing and implementing new operations in an ad hoc fashion. Sparks' basic abstraction is the coherence history, an object that summarizes past coherence actions to shared segments. Our emphasis here is more on creating and investigating the abstractions that make a broad variety of optimizations possible, rather than on the individual optimizations themselves. However, we will thoroughly quantify the performance gains allowed by the synchronization types created via the Sparks library. Our overall goal is to improve DSM performance. We will gauge our success by targeting applications from benchmark suites such as SPLASH-2, as well as representative applications from computational chemistry, biology, and satellite image analysis. Sparks' history abstraction will be used to make several important contributions towards our performance goal: (1) efficient techniques to implement high-level synchronization, (2) efficient automatic prefetching using prefetch playbacks, and (3) external interfaces to run-time libraries and automatically paralle lized code sections. By improving DSM efficiency, we hope to make the shared memory paradigm more appealing, and therefore useful, to the research community. Dept. of Computer Science, Univ. of Maryland,
The Relative Importance of. Pete Keleher. October 1995.
This paper presents a detailed comparison of the relative importance of allowing concurrent writers versus the choice of the underlying consistency model. Our comparison is based on single- and multiplewriter versions of a lazy release consistent (LRC) protocol, and a single-writer sequentially consistent protocol, all implemented in the CVM software distributed shared memory system. We find that in our environment, which we believe to be representative of distributed systems today and in the near future, the consistency model has a much higher impact on overall performance than the choice of whether to allow concurrent writers. The multiple writer protocol performs an average of 9% better than the single writer LRC protocol, but 34% better than the single-writer sequentially consistent protocol. Set against this, MW-LRC required an average of 72% memory overhead, compared to 10% overhead for the single-writer protocols. Dept. of Computer Science, Univ. of Maryland,
Ben Shneiderman. Maryam Alavi. Kent L. Norman. Ellen Yu Borkowski. September 1995.
Windows of Opportunity in Electronic Classrooms. In our seven year effort to build electronic classrooms we tried to balance the pursuit of new technologies with the exploration of new teaching/learning styles while providing the necessary infrastructure for faculty training and support, and collecti ng ample evaluation data to guide our transformation. This experience has led to a growing community of faculty users, widespread student acceptance, and administration support for expansion. After four years of usage by 44 faculty (20 tenured, 9 untenured, 15 other staff) from 16 departments offering 122 courses with over 4010 students we are ready to report on the lessons we have learned. Courses filled most slots from 8am to 10pm, and were as diverse as The Role of Media in the American Political Process, Chinese Poetry into English, Marketing Research Methods, Database Design, and Saving the Bay. (Also cross-referenced as CAR-TR-797) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research, College of Business, Department of Psychology, Computer Science Center,
Compositional Verification by Model Checking for Counter-Examples. Tevfik Bultan. Jeffrey Fischer. Richard Gerber. October 1995.
Many concurrent systems are required to maintain certain safety and liveness properties. One emerging method of achieving confidence in such systems is to statically verify them using "model checking". In this approach an abstract, finite-state model of the system is constructed; then an automatic check is made to ensure that the requirements are satisfied by the model. In practice, however, this method is limited by the "state space explosion problem". We have developed a compositional method that directly addresses this problem in the context of multi-tasking programs. Our solution depends on three key space-saving ingredients: (1) checking for counter-examples, which leads to simpler search algorithms; (2) automatic extraction of interfaces, which allows a refinement of the finite model -- even before its communicating partners have been compiled; and (3) using propositional "strengthening assertions" for the sole purpose of reducing state space. In this paper we present our compositional approach, and describe the software tools that support it. (Also cross-referenced as UMIACS-TR-95-98) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Complexity of Finding Most Vital Arcs and Nodes. Amotz Bar-Noy. Samir Khuller. Baruch Schieber. November 1995.
Let $G(V,E)$ be a graph (either directed or undirected) with a non-negative length $\ell(e)$ associated with each arc $e$ in $E$. For two specified nodes $s$ and $t$ in $V$, the $k$ most vital arcs (or nodes) are those $k$ arcs (nodes) whose removal maximizes the increase in the length of the shortest path from $s$ to $t$. We prove that finding the $k$ most vital arcs (or nodes) is NP-hard, even when all arcs have unit length. We also correct some errors in an earlier paper by Malik, Mittal and Gupta [ORL 8:223-227, 1989]. (Also cross-referenced as UMIACS-TR-95-96) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Sandor P. Fekete. Samir Khuller. Monika Klemmstein. Balaji Raghavachari. Neal Young. A Network-Flow Technique for Finding Low-Weight Bounded-Degree Spanning. December 1995.
Given a graph with edge weights satisfying the triangle inequality, and a degree bound for each vertex, the problem of computing a low weight spanning tree such that the degree of each vertex is at most its specified bound is considered. In particular, modifying a given spanning tree $T$ using {\em adoptions} to meet the degree constraints is considered. A novel network-flow based algorithm for finding a good sequence of adoptions is introduced. The method yields a better performance guarantee than any previously obtained. Equally importantly, it yields the best performance guarantee among the class of algorithms that rely solely on the topology and edge weights of the given tree. The performance guarantee is the following. If the degree constraint $d(v)$ for each $v$ is at least $2$, the algorithm is guaranteed to find a tree whose weight is at most the weight of the given tree times $2 - \min\{\frac{d(v)-2}{\D_T(v)-2} : \D_T(v)>2\},$ where $D_T(v)$ is the initial degree of $v$. Examples are provided in which no lighter tree meeting the degree constraint exists. Linear-time algorithms are provided with the same worst-case performance guarantee. Choosing $T$ to be a minimum spanning tree yields approximation algorithms for the general problem on geometric graphs with distances induced by various $L_p$ norms. Finally, examples of Euclidean graphs are provided in which the ratio of the lengths of an optimal Traveling Salesperson path and a minimum spanning tree can be arbitrarily close to~2. (Also cross-referenced as UMIACS-TR-95-95) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ben Shneiderman. Anne Rose. October 1995.
Social Impact Statements: Engaging Public Participation in Information. "The real question before us lies here: do these instruments further life and enhance its values, or not?Ó - Mumford (1934) p. 318 Computers have become an integral part of our everyday lives. Banks, airlines, motor vehicle administrations, police departments, Social Security, and the Internal Revenue Service all depend on computers. From their introduction, people have questioned the impact computers will have on society. We believe it is our responsibility as system designers to achieve organizational goals while serving human needs and protecting individual rights. The proposed Social Impact Statements (Shneiderman, 1990) woul d identify the impacts of information systems on direct and indirect users, who may be employees or the public. This paper proposes a framework for implementing Social Impact Statements for federal and local government agencies and regulated industries, with optional participation by the other privately held corporations. A Social Impact Statement should describe the new system and its benefits, acknowledge concerns and potential barriers, outline the development process, and address fundamental principl es. Examples from our work with the Maryland Department of Juvenile Justice are offered. Also cross-referenced as CAR-TR-796 Human-Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
On the Perturbation of LU and Cholesky Factors*. G. W. Stewart. October 1995.
In a recent paper, Chang and Paige have shown that the usual perturbation bounds for Cholesky factors can systematically overestimate the errors. In this note we sharpen their results and extend them to the factors of the LU decomposition. The results are based on a new formula for the first order terms of the error in the factors. (Also cross-referenced as UMIACS-TR-95-93) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On Sublinear Convergence. G. W. Stewart. October 1995.
This note develops a theory of sublinearly converging sequences, including a categorization of the rates of convergence and a method for determining the rate from an iteration function. (Also cross-referenced as UMIACS-TR-95-92) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
The Triangular Matrices of Gaussian Elimination and Related. G. W. Stewart. October 1995.
It has become a commonplace that triangular systems are solved to higher accuracy than their condition would warrant. This observation is not true in general, and counterexamples are easy to construct. However, it is often true of the triangular matrices from pivoted LU or QR decompositions. It is shown that this fact is closely connected with the rank-revealing character of these decompositions. (Also cross-referenced as UMIACS-TR-95-91) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
1995 Human-Computer Interaction Laboratory Video Reports. Catherine Plaisant (Edited by). June 1995.
49 minute video of the labs work over the past year. Topics are: Introduction and table of contents - Ben Shneiderman Using Dynamic Queries for Youth Services Information - Anne Rose, Ajit Vanniamparampil Life-Lines: Visualizing Personal Histories - Brett Milash, Catherine Plaisant, Anne Rose Dynamic Queries and Pruning for Large Tree Structures - Harsha Kumar Browsing Anatomical Image Databases : the Visible Human - Flip Korn, Chris North Spinning Your Web: WWW Interface Design Issues - Vince Boisselle BizView : Managing Business and Network Alarms - Catherine Plaisant, Wei Zhao and Rina Levy Animated Specifications Using Interaction Object Graphs - David Carr WinSurfer: Treemaps for Replacing the Windows File Manager - Marko Teittinen (Also cross-referenced as CAR-TR-795) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant (Edited by). John Reesch (Video by). June 1994.
1994 Human-Computer Interaction Laboratory Video Reports. 80 minute video demonstrations of the past year's research Topics are: ¥Introduction and table of contents - Ben Shneiderman, [3:18] ¥Visual information seeking using the FilmFinder - Christopher Ahlberg, Ben Shneiderman, [6:12] ¥Organization overviews and role management-Inspiration for future desktop environments - Catherine Plaisant, Ben Shneiderman, [9:39] ¥Visual decision-making: using treemaps for the analytic hierarchy process - Toshiyuki Asahi, Ben Shneiderman, David Turo, [8:34] ¥Visual information management for satellite network configuration-Catherine Plaisant, Harsha Kumar, Marko Teittinen, Ben Shneiderman, [8:49] ¥Graphical macros: a technique for customizing any application using pixel-pattern matching-Richard Potter, [9:49] ¥Education by engagement and construction: can distance learning be better than face to face?- Ben Shneiderman, [15:00] ¥Dynamic queries demos: revised HomeFinder and text version plus health statistics atlas-Ben Shneiderman, [9:40] Dynamic Queries are user controlled displays of visual or textual information. Ben Shneiderman presents the HomeFinder (developed by Chris Williamson), followed by the text version (Vinit Jain) and the Health Statistics Atlas (Catherine Plaisant and Vinit Jain). ¥CHI '94 slide and video show- [9:12]Open House '94 Video (Also cross-referenced as CAR-TR-794) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant (Editor). June 1993.
1993 Human-Computer Interaction Laboratory Video Reports. ¥Introduction and table of contents - Ben Shneiderman, [4:00] ¥Dynamaps: dynamic queries on a health statistics atlas - Catherine Plaisant and Vinit Jain, [6:34], ¥Hierarchical visualization with Treemaps: making sense of pro basketball data - Dave Turo, [10:47], ¥TreeVizª: file directory browsing - Brian Johnson, [10:04], ¥HyperCourseware: computer integrated tools in the AT&T Teaching Theater - Kent Norman, [7:08], ¥Improving access to medical abstracts: Grateful Med Interface prototype - Gary Marchionini, [6:08], ¥Layout appropriateness: guiding interface desi gn with simple task descriptions - Andrew Sears, [4:00] (Also cross-referenced as CAR-TR-793) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant (Editor). June 1992.
1992 Human-Computer Interaction Laboratory Video Reports. Introduction - Ben Shneiderman, [3:00], ¥Dynamic Queries: database searching by direct manipulation - Ben Shneiderman, Chris Williamson, Christopher Ahlberg, [10:55], ¥Treemaps for visualizing hierarchical information - Ben Shneiderman, Brian Johnson, Dave Turo, [11:25], ¥Three strategies for directory browsing - Rick Chimera, [10:30], ¥Filter-Flow metaphor for boolean queries - Degi Young, Ben Shneiderman, [6:35], ¥The AT&T Teaching Theater: active learning through computer supported collaborative courseware - Kent Norman, [8:25], ¥ACCESS: an online public access catalog at the Library of Congress - Gary Marchionini, [8:15] ¥Remote Direct Manipulation: a telepathology workstation - Catherine Plaisant, Dave Carr, [7:30], ¥Guiding automation with pixels: a technique for programming in the user interface - Richard Potter, [11:50] (Also cross-referenced as CAR-TR-792) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant (Editor). June 1991.
1991 Human-Computer Interaction Laboratory Video Reports. Introduction - Ben Shneiderman, Scheduling home control devices - Catherine Plaisant, Ben Shneiderman, Touchscreen toggles - Catherine Plaisant , A home automation system - Reuel Launey (Custom Command Systems), PlayPen II (now known as PenPlay II) : A novel fingerpainting program - Andrew Sears, Ben Shneiderman, Touchscreen keyboards - Andrew Sears, Ben Shneiderman, Pie menus - Don Hopkins, Three interfaces for browsing tables of contents - Rick Chimera (Also cross-referenced as CAR-TR-791) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Tae-Hyung Kim. James M. Purtilo. TITLE: A Source-Level Transformation Framwork for RPC-Based. September 1995.
The remote procedure call (RPC) paradigm has been a favorite of programmers who write distributed programs because RPC uses a familiar procedure call abstraction as the sole mechanism of operation. The abstraction helps to simplify programming tasks, but this does not mean that the resulting program's RPC-based flow of control will be anything close to ideal for high performance. The purpose of our research is to provide a source-level transformation framework as an alternative way to implement an RPC-based distributed program, so that the code can be optimized through program analysis techniques. This paper describes the transformation tools we have constructed towards this end. (Also cross-referenced as UMIACS-TR-95-90) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Khoa Doan. Catherine Plaisant. Ben Shneiderman. September 1995.
Query previews in networked information systems. In a networked information system, there are three major obstacles facing users in a querying process: slow network performance, large data volume and data complexity. In order to overcome these obstacles, we propose a two-phase approach to query form ulation: Query Preview and Query Refinement. In the Query Preview phase, users formulate an initial query by selecting desired attribute values. The volume of matching data sets is shown graphically on preview bars which aid users to rapidly eliminate undesired data sets, and focus on a manageable number of relevant data sets. Query previews also prevent wasted steps by eliminating zero-hit queries. When the estimated number of data sets is low enough, users submit the initial query to the network, which returns the metadata of the data sets for the Query Refinement phase. Using this approach, we developed dynamic query user interfaces allowing users to formulate their queries using direct manipulation in an exploratory manner across a networked environment. (Also cross-referenced as CAR-TR-788 (Also cross-referenced as ISR-TR-95-90 Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant. Brett Milash. Anne Rose. Seth Widoff. Ben Shneiderman. September 1995.
LifeLines: Visualizing personal histories. LifeLines provide a general visualization environment for personal histories that can be applied to medical and court records, professional histories and other types of biographical data. A one screen overview shows multiple facets of the records. Aspects, for example medical conditions or legal cases, are displayed as individual time lines, while icons indicate discrete events, such as physician consultations or legal reviews. Line color and thickness illustrate relationships or significance, scaling tools and filters allow users to focus on part of the information. LifeLines reduce the chances of missing information, facilitate spotting anomalies and trends, streamline access to details, while remaining tailorable and easily sharable between applications. The paper describes the use of LifeLines for youth records of the Maryland Department of Juvenile Justice and also for medical records. User's feedback was collected using a Visual Basic prototype for the youth record. Techniques to deal with complex records are reviewed and issues of a standard personal record format are discussed. additional reference numbers in the format of the next line (Also cross-referenced as CAR-TR-787, ISR-TR-95-88) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Eser Kandogan. Ben Shneiderman. September 1995.
Elastic Windows: Improved spatial layout and rapid multiple window. Most windowing systems follow the independent overlapping windows approach, which emerged as an answer to the needs of the 80s' applications and technology. Advances in computers, display technology, and the applications demand more functionality from window management systems. Based on these changes and the problems of current windowing approaches, we have updated the requirements for multi-window systems to guide new methods of window management. We propose elastic windows with improved spatial layout and rapid multi-window operations. Multi-window operations are achieved by issuing operations on a hierarchically organized group of windows in a space-filling tiled layout. Sophisticated multi-window operations like Hook, Pump, Minimize, Restore, Move and Relocate have been developed to handle fast task-switching and to structure the work environment of users to their rapidly changing needs. We claim that these multi-window operations and the tiled layout decrease the cognitive load on users. Users found our prototype system to be comprehensible and enjoyable as they playfully explored the way multiple windows are reshaped. (Also cross-referenced as CAR-TR-786, ISR-TR-95-89) Human Computer Interaction Laboratory, Center for Automation Research, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Patricia McCarthy. Adam Porter A.. Harvey Siy. Lawrence G. Votta Jr.. An Experiment to Assess Cost-Benefits of Inspection Meetings and their. September 1995.
We hypothesize that inspection meetings are far less effective than many people believe and that meetingless inspections are equally effective. However, two of our previous industrial case studies contradict each other on this issue. Therefore, we are conducting a multi-trial, controlled experiment to assess the benefits of inspection meetings and to evaluate alternative procedures. The experiment manipulates four independent variables- (1) the inspection method used (two methods involve meetings, one method does not), (2) the requirements specification to be inspected (there are two), (3) the inspection round (each team participates in two inspections), and (4) the presentation order (either specification can be inspected first). For each experiment we measure 3 dependent variables: (1) the individual fault detection rate, (2) the team fault detection rate, and (3) the percentage of faults originally discovered after the initial inspection phase (during which phase reviewers individually analyze the document). So far we have completed one run of the experiment with 21 graduate students in the computer science at the University of Maryland as subjects, but we do not yet have enough data points to draw definite conclusions. Rather than presenting preliminary conclusions, this article (1) describes the experiment's design and the provocative hypotheses we are evaluating, (2) summarizes our observations from the experiment's initial run, and (3) discusses how we are using these observations to verify our data collection instruments and to refine future experimental runs. (Also cross-referenced as UMIACS-TR-95-89) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Title of Dissertation: Supporting Distributed Multimedia Applications on. Debanjan Saha. September 1995.
ATM offers a number of features, such as high-bandwidth, and provision for per-connection quality of service guarantees, making it particularly attractive to multimedia applications. Unfortunately, the bandwidth available at ATM's data-link layer is not visible to the applications due to operating system (OS) bottlenecks at the host-network interface. Similarly, the promise of per-connection service guarantees is still elusive due to the lack of appropriate traffic control mechanisms. In this dissertation, we investigate both of these problems, taking multimedia applications as examples. The OS bottlenecks are not limited to the network interfaces, but affect the performance of the entire I/O subsystem. We propose to alleviate OS's I/O bottleneck by according more autonomy to I/O devices and by using a connection oriented framework for I/O transfers. We present experimental results on a video conferencing testbed demonstrating the tremendous performance impact of the proposed I/O architecture on networked multimedia applications. To address the problem of quality of service support in ATM networks, we propose a simple cell scheduling mechanism, named carry-over round robin (CORR). Using analytical techniques, we analyze the delay performance of CORR scheduling. Besides providing guarantees on delay, CORR is also fair in distributing the excess bandwidth. We show that albeit its simplicity, CORR is very competitive with other more complex schemes both in terms of delay performance and fairness. (Also cross-referenced as UMIACS-TR-95-88) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Yuan-Jye Jason Wu. Downdating a Rank-Revealing URV Decomposition. August 1995.
Abstract. The rank-revealing URV decomposition is a useful tool for the subspace tracking problem in digital signal processing. Updating the decomposition is a stable process. However, downdating a rank-revealing URV decomposition could be unstable because the R factor is ill-conditioned. In this paper, we review some existing downdating algorithms for the full-rank URV decomposition in the absence of U and develop a new combined algorithm. We also show that the combined algorithm has relational stability. For the rank-revealing URV decomposition, we review a two-step method that applies full-rank downdating algorithms to the signal and noise parts separately. We compare several combinations of the full-rank algorithms and demonstrate good performance of our combined algorithm. Dept. of Computer Science, Univ. of Maryland,
Time-Domain Extraction of Broad-Band Sources by. Jacob Roginsky. G. W. Stewart. August 1995.
Single receiver source deconvolution in a shallow water environment is an ill-posed problem whose difficulty is compounded by the multipath nature of the propagation operator. If only sources that are quiescent prior to some initial time to are considered, the result of discretizatizing the problem in the time domain is an ill-conditioned triangular Toeplitz system. In this paper we show how an algorithm of Elden can be used to implement Tikhonov-Phillips regularization for this system. Unlike the multichannel deconvolution techniques used in underwater acoustics, this method can extract source signatures using the outputs of a single sensor. In addition, when the propagation is multipath and source signature extraction is performed as part of an optimization procedure for environmental inversion, we can work with shorter time windows so that the process becomes computationally more efficient than frequency domain deconvolution. A number of examples of the use of the Tikhonov-Philips regularization method for source series extraction are provided. (Also cross-referenced as UMIACS-TR-95-87) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Note on Conjugate Gradient Convergence. Aaron E. Naiman. Ivo M. Babuska. Howard C. Elman. August 1995.
The one-dimensional discrete Poisson equation on a uniform grid with $n$ points produces a linear system of equations with a symmetric positive-definite coefficient matrix. Hence, the conjugate gradient method can be used, and standard analysis gives an upper bound of $O(n)$ on the number of iterations required for convergence. This paper introduces a systematically defined set of solutions dependent on a parameter $\beta$, and for several values of $\beta$, presents exact analytic expressions for the number of steps $k(\beta,\tau,n)$ needed to achieve accuracy $\tau$. The asymptotic behavior of these expressions has the form $O(n^{\alpha})$ as $n \to \infty$ and $O(\tau^{\gamma})$ as $\tau \to \infty$. In particular, two choices of $\beta$ corresponding to nonsmooth solutions give $\alpha=0$, i.e., iteration counts independent of $n$; this is in contrast to the standard bounds. The standard asymptotic convergence behavior, $\alpha=1$, is seen for a relatively smooth solution. Numerical examples illustrate and supplement the analysis. (Also cross-referenced as UMIACS-TR-95-86) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Andreas Mueller. Fast Sequential and Parallel Algorithms for Association Rule Mining:. August 1995.
The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TID-list data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one database portion at a time in memory. We implemented an algorithm called SPTID that incorporates both TID-lists and partitioning to study their benefits. For comparison, a non-partitioning algorithm called SEAR, which is based on a new prefix-tree data structure, is used. Our experiments with SPTID and SEAR indicate that TID-lists have inherent inefficiencies; furthermore, because all of the algorithms tested tend to be CPU-boundn trading CPU-overhead against I/O operations by partitioning did not lead to better performance. In order to scale mining algorithms to the huge databases (e.g., multiple Terabytes) that large organizations will manage in the near future, we implemented parallel versions of SEAR and SPEAR (its partitioned counterpart). The performance results show that, while both algorithms parallelize easily and obtain good speedup and scale-up results, the parallel SEAR version performs better than parallel SPEAR, despite the fact that it uses more communication. Dept. of Computer Science, Univ. of Maryland,
A Survey of Information Retrieval and Filtering Methods. Christos Faloutsos. Douglas W. Oard. August 1995.
We survey the major techniques for information retrieval. In the first part, we provide an overview of the traditional ones (full text scanning, inversion, signature files and clustering). In the second part we discuss attempts to include semantic information (natural language processing, latent semantic indexing and neural networks). Dept. of Computer Science, Univ. of Maryland,
A Planning Approach to Declarer Play in Contract Bridge. S. Smith. Dana S. Nau. T. Throop. August 1995.
Although game-tree search works well in perfect-information games, it is less suitable for imperfect-information games such as contract bridge. The lack of knowledge about the opponents' possible moves gives the game tree a very large branching factor, making it impossible to search a significant portion of this tree in a reasonable amount of time. This paper describes our approach for overcoming this problem. We represent information about bridge in a task network that is extended to represent multi-agency and uncertainty. Our game-playing procedure uses this task network to generate game trees in which the set of alternative choices is determined not by the set of possible actions, but by the set of available tactical and strategic schemes. We have tested this approach on declarer play in the game of bridge, in an implementation called Tignum 2. On 5000 randomly generated notrump deals, Tignum 2 beat the strongest commercially available program by 1394 to 1302, with 2304 ties. These results are statistically significant at the alpha = 0.05 level. Tignum~2 searched an average of only 8745.6 moves per deal in an average time of only 27.5 seconds per deal on a Sun SPARCstation 10. Further enhancements to Tignum~2 are currently underway. (Also cross-referenced as UMIACS-TR-95-85) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Study of Query Execution Strategies for Client-Server Database Systems. Donald Kossmann. Michael J. Franklin. August 1995.
Query processing in a client-server database system raises the question of where to execute queries to minimize the communication costs and response time of a query, and to load-balance the system. This paper evaluates the two common query execution strategies, data shipping and query shipping, and a policy referred to as hybrid shipping. Data shipping determines that queries be executed at clients; query shipping determines that queries be executed at servers; and hybrid shipping provides the flexibility to execute queries at clients and servers. The experiments with a client-server model confirm that the query execution policy is critical for the performance of a system. Neither data nor query shipping are optimal in all situations, and the performance penalities can be substantial. Hybrid shipping at least matches the best performance of data and query shipping and shows better performance than both in many cases. The performance of hybrid shipping plans, however, is shown to be sensitive to changes in the state of the system (e.g., the load of machines and the contents of caches). Initial experiments indicate that an extended version of a 2-step optimization may be an effective strategy for adjusting plans according to the state of the system at runtime. (Also cross-referenced as UMIACS-TR-95-85) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Michael J. Franklin. Michael J. Carey. Miron Livny. Transactional Client-Server Cache Consistency: Alternatives and. September 1995.
Client-server database systems based on a page server model can exploit client memory resources by caching copies of pages across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this paper we provide a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algorithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The insight gained in this manner is then used to reflect upon the characteristics of other algorithms that have been proposed. The results show that the interactions among dimensions of the design space can impact performance in many ways, and that classifications of algorithms as simply Pessimistic" or Optimistic" do not accurately characterize the similarities and differences among the many possible cache consistency algorithms. (Also cross-referenced as UMIACS-TR-95-84) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiler and Runtime Support for Programming in Adaptive. Guy Edjlali. Gagan Agrawal. Alan Sussman. Jim Humphries. Joel Saltz. July 1995.
For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers of HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of non-dedicated workstations, which are likely to be an important resource for parallel programming in the future. (Also cross-referenced as UMIACS-TR-95-83) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Analytical and Empirical Evaluation of Software Reuse Metrics. Prem Devanbu. Sakke Karstu. Walcelio L. Melo. William Thomas. September 1995.
How much can be saved by using pre-existing (or somewhat modified) software components when developing new software systems? With the increasing adoption of reuse methods and technologies, this question becomes critical. However, directly tracking the actual cost savings due to reuse is difficult. A worthy goal would be to develop a method of measuring the savings indirectly by analyzing the code for reuse of components. The focus of this paper is to evaluate how well several published software reuse metrics measure the ``time, money and quality'' benefits of software reuse. We conduct this evaluation both analytically and empirically. On the analytic front, we first develop some properties that should arguably hold of any measure of ``time, money and quality'' benefit due to reuse. We assess several existing software reuse metrics using these properties. Empirically, we constructed a toolset (using GEN++) to gather data on all published reuse metrics from C++ code; then, using some productivity and quality data from ``nearly replicated'' student projects at the University of Maryland, we evaluate the relationship the known metrics and the process data. The results show that different reuse metrics can be used as predictors of different quality attributes, and suggest possible directions for improving the known measures. (Also cross-referenced as UMIACS-TR-95-82) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Designing Temporal Controls. Ashok K. Agrawala. Seonho Choi. Leyuan Shi. July 1995.
Traditional control systems have been designed to exercise control at regularly spaced time instants. When a discrete version of the system dynamics is used, a constant sampling interval is assumed and a new control value is calculated and exercised at each time instant. In this paper we formulate a new control scheme, {\it temporal control}, in which we not only calculate the control value but also decide the time instants when the new values are to be used. Taking a discrete, linear, time-invariant system, and a cost function which reflects a cost for computation of the control values, as an example, we show the feasibility of using this scheme. We formulate the temporal control scheme as a feedback scheme and, through a numerical example, demonstrate the significant reduction in cost through the use of temporal control. (Also cross-referenced as UMIACS-TR-95-81) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Product Unit Learning. Laurens R. Leerink. C. Lee Giles. Bill G. Horne. Marwan A. Jabri. January 1996.
Product units provide a method of automatically learning the higher-order input combinations required for the efficient synthesis of Boolean logic functions by neural networks. Product units also have a higher information capacity than sigmoidal networks. However, this activation function has not received much attention in the literature. A possible reason for this is that one encounters some problems when using standard backpropagation to train networks containing these units. This report examines these problems, and evaluates the performance of three training algorithms on networks of this type. Empirical results indicate that the error surface of networks containing product units have more local minima than corresponding networks with summation units. For this reason, a combination of local and global training algorithms were found to provide the most reliable convergence. We then investigate how `hints' can be added to the training algorithm. By extracting a common frequency from the input weights, and training this frequency separately, we show that convergence can be accelerated. A constructive algorithm is then introduced which adds product units to a network as required by the problem. Simulations show that for the same problems this method creates a network with significantly less neurons than those constructed by the tiling and upstart algorithms. In order to compare their performance with other transfer functions, product units were implemented as candidate units in the Cascade Correlation (CC) \cite{Fahlman90} system. Using these candidate units resulted in smaller networks which trained faster than when the any of the standard (three sigmoidal types and one Gaussian) transfer functions were used. This superiority was confirmed when a pool of candidate units of four different nonlinear activation functions were used, which have to compete for addition to the network. Extensive simulations showed that for the problem of implementing random Boolean logic functions, product units are always chosen above any of the other transfer functions. (Also cross-referenced as UMIACS-TR-95-80) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Understanding and Predicting the Process of Software. Victor R. Basili. Lionel Briand. Steve Condon. Yong-Mi Kim. Walcelio L. Melo. Jon Valett. September 1995.
One of the major concerns of any maintenance organization is how to estimate the cost of subsequent releases of software systems. Planning the next release, maximizing the increase in functionality and improving the quality are vital to successful maintenance management. The objective of this paper is to present the results of a case study in which an incremental and inductive approach was used to build a model for predicting software maintenance releases in a large-scale software maintenance organization. This study was conducted in the Flight Dynamics Division (FDD) of the NASA Goddard Space Flight Center (GSFC). This organization is representative of many other software maintenance organizations. Over one hundred software systems totalling about 4.5 million lines of code are maintained by this organization. Many of these systems have been maintained for many years and regularly produce new releases. The maintenance costs in this organization have increased considerably over the last few years. This paper shows the predictive model developed for the FDD's software maintenance release process. Lessons learned during the establishment of a measurement-based software maintenance improvement program in this organization are also described and future work is outlined. (Also cross-referenced as UMIACS-TR-95-79) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Learning Long-Term Dependencies is Not as Difficult With NARX. Tsungnan Lin. Bill G. Horne. Peter Tino. C. Lee Giles. July 1995.
It has recently been shown that gradient descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long- term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. In this paper we explore the long-term dependencies problem for a class of architectures called NARX recurrent neural networks, which have power ful representational capabilities. We have previously reported that gradient descent learning is more effective in NARX networks than in recurrent neural network architectures that have ``hidden states'' on problems includ ing grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other net works. The results in this paper are an attempt to explain this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventional recurrent neural networks. We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on long-term dependency problems. We also describe in detail some of the assumption regarding what it means to latch information robustly and suggest possible ways to loosen these assumptions. (Also cross-referenced as UMIACS-TR-95-78) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
AINSI - An Inductive Software Process Improvement Method: Concrete. Lionel Briand. Kaled El Eman. Walcelio L. Melo. July 1995.
Top-down approaches to process improvement based on generic "best practice" models (e.g., CMM, TRILLIUM, BOOTSTRAP, SPICE) have become popular. Despite the idiosyncrasies of each of these approaches, they share some common characteristics: all of them are based on numerous assumptions about what are best practices, and about the business goals of organizations and the problems they face. Other organizations, like the Software Engineering Laboratory of the NASA Goddard Space Flight Center, HP and CRIM in Canada, have adopted the Quality Improvement Paradigm (QIP). The QIP stipulates a more bottom-up and inductive approach to process improvement. The focus of this paradigm is to first understand what processes exist in the organization and to determine what causes the most significant problems. Based on this, opportunities for improvement are devised, and empirical studies are conducted to evaluate potential solutions. In this paper, we present a method, named AINSI (An INductive Software process Improvment method), which defines general but concrete steps and guidelines for putting in place the QIP. This method is the result of the collective experiences of the authors and integrates many lessons learned from process improvement efforts in different environments. It also integrates many complementary techniques such as qualitative analysis, methods for data collection (e.g., the Goal/Question/Metric paradigm), and quantitative evaluation. (Also cross-referenced as UMIACS-TR-95-77) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ibrahim Matta. Fast Evaluation and Dynamic Control of Integrated Services Networks. July 1995.
Integrated services networks, such as ATM (Asynchronous Transfer Mode) networks, are expected to operate at gigabit per second rates and provide various virtual-circuit and datagram services. For this purpose, new control algorithms (e.g. scheduling, admission, routing) have been proposed. The algorithms are often adaptive, resulting in complex time-dependent interactions. This renders traditional evaluation tools ineffective; analytical approaches are typically too coarse, and simulation approaches are often too expensive. The goal of our research is to develop accurate analytical models that account for the interaction and time-dependent nature of the control algorithms, while at the same time being inexpensive or easy to solve. This would allow the rapid and tractable evaluation of different design alternatives. In this dissertation, we develop both dynamic models and quasi-static models of integrated networks. Dynamic models can be used to evaluate both virtual-circuit and datagram services. We solve dynamic models using a new iterative method, referred to as the ``Z-iteration''. Our method is both accurate and fast. It permits the joint evaluation of various scheduling, admission, and routing schemes used in integrated networks. We show results comparing dynamic routing schemes on a network with NSFNET-backbone topology. We also illustrate the applicability of the Z-iteration to other high-performance systems. Quasi-static models are suitable for evaluating datagram services for which the quasi-static assumption is reasonable. We analyze a quasi-static model of a datagram network offering different classes of service. We apply the Liapunov function method to derive stability conditions for the routes of the different traffic classes. We show how with scheduling support for routing, the routes of the traffic classes can be isolated, thereby improving the overall network performance. (Also cross-referenced as UMIACS-TR-95-76) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Generic Architecture for Programmable Traffic Shaper. Krishnan K. Kailas. Ashok K. Agrawala. S. V. Raghavan. July 1995.
Traffic shapers by preventing congestion and smoothing the traffic, play an important role in realizing the traffic control schemes employed in high speed networks to ensure the Quality of Service (QoS) requirements of the application. In this report, we present a generic architecture for programmable traffic shaper for high speed networks. The programmability of the proposed architecture is illustrated by implementing some of the existing traffic shaping schemes. The architectural design issues of the proposed scheme are described and discussed. (Also cross-referenced as UMIACS-TR-95-75) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David A. Bader. Joseph Ja'Ja'. July 1995.
Practical Parallel Algorithms for Dynamic Data Redistribution,. Institute for Advanced Computer Studies, Department of Electrical Engineering,, A common statistical problem is that of finding the median element in a set of data. This paper presents a fast and portable parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank $i$, for an arbitrarily given integer $i$. Practical algorithms needed by our selection algorithm for the dynamic redistribution of data are also discussed. Our general framework is a single-address space, distributed memory programming model that is enhanced by a set of communication primitives. We use efficient techniques for distributing, coalescing, and load balancing data as well as efficient combinations of task and data parallelism. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results illustrate the scalability and efficiency of our algorithms across different platforms and improve upon all the related experimental results known to the authors. More efficient implementations of the communication primitives will likely result in even faster execution times. (Also cross-referenced as UMIACS-TR-95-44.)
Chia-Mei Chen. Scheduling Issues in Real-Time Systems. June 1995.
The most important objective of real-time systems is to fulfill time-critical missions in satisfying their application requirements and timing constraints. Software utilities can analyze real-time tasks and extract their characteristics and requirements for assisting the systems to guarantee schedulability. Real- time scheduling is the core of the real-time system design. It should allow real-time systems to exhibit predictable timing correctness regardless of possible uncertainty in run-time environments. In this dissertation, we study the problem of scheduling real-time tasks with resource and fault-tolerance requirements. For tasks with resource requirements, two types of platforms are examined: multiprocessor hard real-time systems and real-time database systems; for task with fault-tolerance requirements, we focus on hard real-time systems. We investigate preemptive priority-based scheduling for tasks with resource requirements in context of hard real-time systems. Rate-monotonic and earliest deadline first priority assignment strategies can meet deadlines if the schedulability conditions are satisfied. We propose resource control protocols, for these scheduling strategies, based on the concepts of priority inheritance and priority ceiling and describe schedulability conditions for meeting deadlines. Real-time database systems have different objectives for transaction scheduling. Minimizing miss ratio usually is the major concern. We study the significance of the knowledge of execution time in system performance and propose a class of optimistic concurrency control protocols using the knowledge of execution time. Our simulation results indicate that the knowledge of execution time substantially improve system performance. Fault-tolerance is an ability to maintain system in a safe and stable state such that the real-time application functions correctly and its timing constraints are satisfied even in the presence of faults. We develop a scheduling algorithm which attempts to build as many fault-tolerant tasks as possible into a schedule. We approximate system reliability by Markov chain models and illustrate the applicability of the proposed reliability models. We compare the proposed fault-tolerance scheduling approach with the basic fault-tolerance scheduling schemes and the simulation results show that our method provides better reliability than the basic scheduling schemes. (Also cross-referenced as UMIACS-TR-95-73) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
S. Radhakrishnan. S. V. Raghavan. Ashok K. Agrawala. Design & Performance Study of a Flexible Traffic Shaper for High. June 1995.
In networks supporting distributed multimedia, maximizing bandwidth utilization and providing performance guarantees are two incompatible goals. Heterogeneity of the multimedia sources calls for effective congestion control schemes to satisfy the diverse Quality of Service (QoS) requirements of each application. These include admission control at connection set up, traffic control at the source ends and efficient scheduling schemes at the switches. The emphasis in this paper is on traffic control at the source ends. Traffic control schemes have two functional roles. One is traffic enforcement as a supplement to the admission control policy. The other is shaping the input traffic so that it becomes amenable to the scheduling mechanism at the switches for providing the required QoS guarantees. Studies on bursty sources have shown that burstiness promotes statistical multiplexing at the cost of possible congestion. Smoothing the traffic helps in providing guarantees at the cost o f bandwidth utilization. The need for a flexible scheme which can provide a reasonable compromise between the utilization and guarantees is imminent. We present the design and performance study of a flexible traffic shaper which can adjust the burstiness of input traffic to obtain reasonable utilization while maintaining statistical service guarantees. The performance of the traffic shaper for bursty sources is studied using simulation. (Also cross-referenced as UMIACS-TR-95-72) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
S. Radhakrishnan. S. V. Raghavan. Ashok K. Agrawala. A Flexible Traffic Shaper for High Speed Networks: Design and Comparative. June 1995.
Maximizing bandwidth utilization and providing performance guarantees, in the context of multimedia networking, are two incompatible goals. Heterogeneity of the multimedia sources calls for effective traffic control schemes to satisfy their diverse Quality of Service (QoS) requiremnets. These include admission control at connection set up, traffic control at the source ends and efficient scheduling schemes at the switches. The emphasis in this paper is on traffic control at the source end. Most multimedia sources are bursty in nature. Traffic shapers have been mainly studied hitherto from the point of view of their effectiveness in smoothing the burstiness. Leaky Bucket (LB) scheme, to cite an example, is a mean rate policer smoothing at the token generation rate. Studies on bursty sources show that burstiness promotes statistical multiplexing at the cost of possible congestion. Smoothing, on the other hand, helps in providing guarantees at the cost of utilization. Thus need for a flexible scheme which can provide a reasonable compromise between utilization and performance is imminent. Recent studies [10, 12] have also questioned the suitability of LB for policing real-time traffic due to the excessive delays. We argue for a policy which is less stringenton short term burstiness than the LB. We propose a new traffic shaper which can adjust the burstiness of the input traffic to obtain reasonable bandwidth utilization while maintaining statistical service guarantees. The performance study is conducted in two parts. In the first part, we study the effect of varying the shaper parameters on the input characteristics. In the second part, we dimension our scheme and a LB equivalently and compare the mean and peak rate policing behavior with delay and loss as the performance parameters. Adopting a less stringent attitude towards short term burstiness is shown to result in considerable advantage while policing real-time traffic. Future research possibilities in this topic are explored. (Also cross-referenced as UMIACS-TR-95-71) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chen Chen. Configuration-Level Programming of Distributed Applications Using. June 1995.
An event-based distributed application is a group of software components interacting with each other by producing events that in turn trigger the invocation of procedures. In this work, we are concerned with the technology and methods for integrating an event-based application, whether that application is being constructed from scratch or synthesized from existing systems. Developing an event-based application is a complex task for programmers, who must address several issues not found in traditional systems and, currently, must do so without much assistance. These issues include event declaration, structure, binding, and naming. Our objective is to provide the same software engineering benefits to programmers of event-based applications as are currently provided to programmers of applications using traditional RPC or message-passing mechanisms. In this work, we broaden the technology for integration to encompass event-based programming. A method is described for separating event interaction properties from the implementation of the application modules so that system integration can be performed using only the abstractions. Then based upon the abstract aggregate, all interface software needed to validly implement the system can be generated automatically. A software bus model has been enhanced to accommodate the models which drive event-based distributed applications. In this way, designers may define complex event-based interactions abstractly, thus making it easier to integrate and experiment with event-based distributed applications. (Also cross-referenced as UMIACS-TR-95-70) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Scott Andrews. Brian Kettler. Kutluhan Erol. James Hendler. UM Translog: A Planning Domain for the Development and Benchmarking of. June 1995.
The last twenty years of AI planning research has discovered a wide variety of planning techniques such as state-space search, hierarchical planning, case-based planning and reactive planning. These techniques have been implemented in numerous planning systems (e.g., STRIPS, SNLP, UCPOP, NONLIN, SIPE). Initially, a number of simple toy domains have been devised to assist in the analysis and evaluation of planning systems and techniques. The most well known examples are ``Blocks World'' and ``Towers of Hanoi.'' As planning systems grow in sophistication and capabilities, however, there is a clear need for planning benchmarks with matching complexity to evaluate those new features and capabilities. UM Translog is a planning domain designed specifically for this purpose. UM Translog was inspired by the CMU Transport Logistics domain developed by Manuela Veloso. UM Translog is an order of magnitude larger in size (41 actions versus 6), number of features and types interactions. It provides a rich set of entities, attributes, actions and conditions, which can be used to specify rather complex planning problems with a variety of plan interactions. The detailed set of operators provides long plans (~40 steps) with many possible solutions to the same problem, and thus this domain can also be used to evaluate the solution quality of planning systems. The UM Translog domain has been used with the UMCP, UM Nonlin, and CaPER planning systems thus far. (Also cross-referenced as UMIACS-TR-95-69) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Using the Parka Parallel Knowledge Representation System (Version 3.2). Brian Kettler. William Andersen. James Hendler. Sean Luke. June 1995.
Parka is a symbolic, semantic network knowledge representation system that takes advantage of the massive parallelism of supercomputers such as the Connection Machine. The Parka language has many of the features of traditional semantic net/frame-based knowledge representation languages but also supports several kinds of rapid parallel inference mechanisms that scale to large knowledge-bases of hundreds of thousands of frames or more. Parka is intended for general-purpose use and has been used thus far to support A.I. systems for case-based reasoning and data mining. This document is a user manual for the current version of Parka, version 3.2. It describes the Parka language and presents some examples of knowledge representation using Parka. Details about the parallel algorithms, implementation, and empirical results are presented elsewhere. (Also cross-referenced as UMIACS-TR-95-68) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Querying Video Libraries. Eenjun Hwang. V.S. Subrahmanian. June 1995.
There is now growing interest in organizing and querying large bodies of video data. In this paper, we will develop a simple SQL-like video query language which can be used not only to identify videos in the library that are of interest to the user, but which can also be used to extract, from such a video in a video library, the relevant segments of the video that satisfy the specified query condition. We investigate various types of user requests and show how they are expressed using our query language. We also develop polynomial-time algorithms to process such queries. Furthermore, we show how video-presentations may be synthesized in response to a user query. We show how a standard relational database system can be extended in order to handle queries such as those expressed in our language. Based on these principles, we have built a prototype video retrieval system called VIQS. We will describe the design and implementation of VIQS and show some sample interactions with VIQS. (Also cross-referenced as UMIACS-TR-95-66) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Automatic Extraction of Semantic Classes from Syntactic. December 1995.
Bonnie J. Dorr. Doug Jones. This paper addresses the issue of word-sense ambiguity in extraction from machine-readable resources for the construction of large-scale knowledge sources. We describe two experiments: one which took word-sense distinctions into account, resulting in 97.9% accuracy for semantic classification of verbs based on (Levin, 1993); and one which ignored word-sense distinctions, resulting in 6.3% accuracy. These experiments were dual purpose: (1) to validate the central thesis of the work of (Levin, 1993), i.e., that verb semantics and syntactic behavior are predictably related; (2) to demonstrate that a 20-fold improvement can be achieved in deriving semantic information from syntactic cues if we first divide the syntactic cues into distinct groupings that correlate with different word senses. Finally, we show that we can provide effective acquisition techniques for novel word senses using a combination of online sources. (Also cross-referenced as UMIACS-TR-95-65) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On the Applicability of Neural Network and Machine Learning. Steve Lawrence. C. Lee Giles. Sandiway Fong. June 1995.
We examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government- and-Binding theory. We investigate the following models: feed-forward neural networks, Fransconi-Gori-Soda and Back-Tsoi locally recurrent networks, Elman, Narendra \& Parthasarathy, and Williams \& Zipser recurrent networks, Euclidean and edit-distance nearest-neighbors, simulated annealing, and decision trees. The feed-forward neural networks and non-neural network machine learning models are included primarily for comparison. We address the question: How can a neural network, with its distributed nature and gradient descent based iterative calculations, possess linguistic capability which is traditionally handled with symbolic computation and recursive processes? Initial simulations with all models were only partially successful by using a large temporal window as input. Models trained in this fashion did not learn the grammar to a significant degree. Attempts at training recurrent networks with small temporal input windows failed until we implemented several techniques aimed at improving the convergence of the gradient descent training algorithms. We discuss the theory and present an empirical study of a variety of models and learning algorithms which highlights behaviour not present when attempting to learn a simpler grammar. (Also cross-referenced as UMIACS-TR-95-64) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Richard Gerber. Seongsoo Hong. Slicing Real-Time Programs for Enhanced Schedulability. May 1995.
(Also cross-referenced as UMIACS-TR-95-62) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Richard Gerber. Dong-in Kang. Seongsoo Hong. Manas Saksena. End-to-End Design of Real-Time Systems. May 1995.
(Also cross-referenced as UMIACS-TR-95-61) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Richard Gerber. Ladan Gharai. Benchmarking Digital Video: Measurements, Analysis, Improvements and. May 1995.
(Also cross-referenced as UMIACS-TR-95-60) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Harsha Kumar. Catherine Plaisant. Ben Shneiderman. March 1995.
Browsing Hierarchical Data with Multi-Level Dynamic Queries and Pruning. Users often must browse hierarchies with thousands of nodes in search of those that best match their information needs. The PDQ Tree-browser (Pruning with Dynamic Queries) visualization tool was specified, designed and developed for this purpose. This tool presents trees in two tightly-coupled views, one a detailed view and the other an overview. Users can use dynamic queries, a method for rapidly filtering data, to filter nodes at each level of the tree. The dynamic query panels are user-customizable. Subtrees of unselected nodes are pruned out, leading to compact views of relevant nodes. Usability testing of the PDQ Tree-browser, done with 8 subjects, helped assess strengths and identify possible improvements. The PDQ Tree-browser was used in Network Management (600 nodes) and UniversityFinder (1100 nodes) applications. A controlled experiment, with 24 subjects, showed that pruning significantly improved performance speed and subjective user satisfaction. Future research directions are suggested. (Also cross-referenced as CAR-TR-772) (Also cross-referenced as ISR-TR-95-53) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Catherine Plaisant. Ben Shneiderman. May 1995.
Organization overviews and role management: Inspiration for future desktop environments. In our exploration of future work environments for the World Bank we propose two concepts. Organization overviews provide a consistent support to present the results of a variety of manual or semi-automated searches. This view can be adapted or expanded for each class of users to finally map the multiple personal roles an individual has in an organization. After command line interfaces, graphical point and click interfaces, and the current "docu-centric" designs, the natural direction is towards a role-centered approach where we believe the emphasis is on the management of those multiple roles. Each role involves coordination with groups of people and accomplishment of tasks within a schedule. (Also cross-referenced as CAR-TR-771) Human-Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Rohit Mahajan. Ben Shneiderman. April 1995.
A Family of User Interface Consistency Checking Tools. Incorporating evaluation metrics with GUI development tools will help designers create consistent interfaces in the future. Complexity in design of interfaces makes efficient evaluation impossible by a single consistency checking evaluation tool. Our f ocus is on developing a family of evaluation tools in order to make the evaluation process less cumbersome. We have developed a dialog box typeface and color table to facilitate detection of anomalies in color, font, font size, and font style. Concordance tools have been developed to spot variant capitalization and abbreviations globally in the interface and specifically in the button widgets. As buttons are frequently used widgets, a button layout table has been created to spot any inconsistencies in height, width and relative position between a given group of buttons if present. Finally, a terminology basket tool has been created to identify unwanted synonyms of computer related terms used in the interface which may be misleading to the end user. (Also cross-referenced as CAR-TR-770) (Also cross-referenced as ISR-TR-95-52) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
The Case for Structure-based Representations. Kathryn E. Sanders. Brian P. Kettler. James Hendler. April 1995.
Case-based reasoning involves reasoning from {\em cases}: specific pieces of experience, the reasoner's or another's, that can be used to solve problems. As a result, case representation is critical: an incomplete case representation limits the system's reasoning power. In this paper we argue for {\em structure-based} case representations, which express arbitrary relations among objects in a flexible way, over more limited or inflexible methods. We motivate the distinction between these kinds of representations with examples from information retrieval systems, CBR systems, and computational models of human analogical reasoning. Structure-based representations provide the benefits of greater expressivity and economy. We give examples of these benefits from two case-based planning systems we have developed, CaPER and CHIRON, and show how the case matching and case acquisition costs can be reduced through the use of massively parallel techniques. (Also cross-referenced as UMIACS-TR-95-56) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Suzanne Ava Stevenson. A Competitve Attachment Model for Resolving Syntactic Ambiguities in. May 1995.
Linguistic ambiguity is the greatest obstacle to achieving practical computational systems for natural language understanding. By contrast, people experience surprisingly little difficulty in interpreting ambiguous linguistic input. This dissertation explores distributed computational techniques for mimicking the human ability to resolve syntactic ambiguities efficiently and effectively. The competitive attachment theory of parsing formulates the processing of an ambiguity as a competition for activation within a hybrid connectionist network. Determining the grammaticality of an input relies on a new approach to distributed communication that integrates numeric and symbolic constraints on passing features through the parsing network. The method establishes syntactic relations both incrementally and efficiently, and underlies the ability of the model to establish long-distance syntactic relations using only local communication within a network. The competitive distribution of numeric evidence focuses the activation of the network onto a particular structural interpretation of the input, resolving ambiguities. In contrast to previous approaches to ambiguity resolution, the model makes no use of explicit preference heuristics or revision strategies. Crucially, the structural decisions of the model conform with human preferences, without those preferences having been incorporated explicitly into the parser. Furthermore, the competitive dynamics of the parsing network account for additional on-line processing data that other models of syntactic preferences have left unaddressed. (Also cross-referenced as UMIACS-TR-95-55) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Extraction of Rules from Discrete-Time. Christian W. Omlin. C. Lee Giles. May 1995.
The extraction of symbolic knowledge from trained neural networks and the direct encoding of (partial) knowledge into networks prior to training are important issues. They allow the exchange of information between symbolic and connectionist knowledge representations. The focus of this paper is on the quality of the rules that are extracted from recurrent neural networks. Discrete-time recurrent neural networks can be trained to correctly classify strings of a regular language. Rules defining the learned grammar can be extracted from networks in the form of deterministic finite-state automata (DFA's) by applying clustering algorithms in the output space of recurrent state neurons. Our algorithm can extract different finite-state automata that are consistent with a training set from the same network. We compare the generalization performances of these different models and the trained network and we introduce a heuristic that permits us to choose among the consistent DFA's the model which best approximates the learned regular grammar. (Also cross-referenced as UMIACS-TR-95-54) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Efficient Algorithms for Atmospheric Correction of Remotely Sensed Data. Hassan Fallah-Adl. Joseph Ja'Ja'. Shunlin Liang. Yoram J. Kaufman. John Townshend. April 1995.
Remotely sensed imagery has been used for developing and validating vairous studies regarding land cover dynamics such as global carbon modeling, biogeochemical cycling, hydrological modeling, and ecosystem response modeling. However, the large amounts of imagery collected by the satellites are largely contaminated by the effects of atmospheric particles through absorption and scattering of the radiation from the earth surface. The objective of atmospheric correction is to retrieve the surface reflectance (that characterizes the surface properties) from remotely sensed imagery by removing the atmospheric effects. Atmospheric correction has been shown to significantly improve the accuracy of image classification. (Also cross-referenced as UMIACS-TR-95-53) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Laura Slaughter. Kent L. Norman. Ben Shneiderman. March 1995.
Assessing users' subjective satisfaction with the Information System for. In this investigation, the Questionnaire for User Interaction Satisfaction (QUIS 5.5), a tool for assessing users' subjective satisfaction with specific aspects of the human/computer interface was used to assess the strengths and weaknesses of the Info rmation System for Youth Services (ISYS). ISYS is used by over 600 employees of the Maryland State Department of Juvenile Services (DJS) as a tracking device for juvenile offenders. Ratings and comments were collected from 254 DJS employees who use ISYS. The overall mean rating across all questions was 5.1 on a one to nine scale. The ten highest and lowest rated questions were identified. The QUIS allowed us to isolate subgroups which were compared with mean ratings from four measures of specific interfac e factors. The comments obtained from users provided suggestions, complaints and endorsements of the system. Also cross-referenced as CAR-TR-768 Human Computer Interaction Laboratory, Department of Psychology, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
The Tower of Pizzas. Michael Tan. Nick Roussopoulos. Steve Kelley. April 1995.
CPU speeds are increasing at a much faster rate than secondary storage device speeds. Many important applications face an I/O bottleneck. We demonstrate that this bottleneck can be alleviated through 1) scalable striping of data and 2) caching/prefetching techniques. This paper describes the design and performance of the Tower of Pizzas (TOPs), a portable software system providing parallel I/O and buffering services. (Also cross-referenced as UMIACS-TR-95-52) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Fixed Points in Two--Neuron Discrete Time Recurrent Networks:. Peter Tino. Bill G. Horne. C. Lee Giles. April 1995.
The position, number and stability types of fixed points of a two--neuron recurrent network with nonzero weights are investigated. Using simple geometrical arguments in the space of derivatives of the sigmoid transfer function with respect to the weighted sum of neuron inputs, we partition the network state space into several regions corresponding to stability types of the fixed points. If the neurons have the same mutual interaction pattern, i.e. they either mutually inhibit or mutually excite themselves, a lower bound on the rate of convergence of the attractive fixed points towards the saturation values, as the absolute values of weights on the self--loops grow, is given. The role of weights in location of fixed points is explored through an intuitively appealing characterization of neurons according to their inhibition/excitation performance in the network. In particular, each neuron can be of one of the four types: greedy, enthusiastic, altruistic or depressed. Both with and without the external inhibition/excitation sources, we investigate the position and number of fixed points according to character of the neurons. When both neurons self-excite (or self-inhibit) themselves and have the same mutual interaction pattern, the mechanism of creation of a new attractive fixed point is shown to be that of saddle node bifurcation. (Also cross-referenced as UMIACS-TR-95-51) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Constructing Deterministic Finite-State Automata. Christian W. Omlin. C. Lee Giles. May 1995.
Recurrent neural networks that are {\it trained} to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can {\it construct} second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of {\it arbitrary length}. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with $n$ states and $m$ input alphabet symbols, the constructive algorithm generates a ``programmed" neural network with $O(n)$ neurons and $O(mn)$ weights. We compare our algorithm to other methods proposed in the literature. Revised in February 1996 (Also cross-referenced as UMIACS-TR-95-50) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ajit J. Vanniamparampil. Ben Shneiderman. Catherine Plaisant. Anne Rose. February 1995.
User Interface Reengineering: A Diagnostic Approach. User interface technology has advanced rapidly in recent years. Incorporating new developments in existing systems could result in substantial improvements in usability, thereby improving performance and user satisfaction, while shortening training an d reducing error rates. Our focus is on low-effort high-payoff improvements to aspects such as data display and entry, consistency, messages, documentation, and system access. This paper provides guidelines for managers and designers responsible for use r interface reengineering, based on the experience we gained from six projects, and compiles our observations, recommendations and outcomes. (Also cross-referenced as CAR-TR-767) Human-Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, College of Business and Management,
Computing Stable and Partial Stable Models of Extended Disjunctive. Carolina Ruiz. Jack Minker. April 1995.
In [Prz91], Przymusinski introduced the partial (or 3-valued) stable model semantics which extends the (2-valued) stable model semantics defined originally by Gelfond and Lifschitz [GL88]. In this paper we describe a procedure to compute the collection of all partial stable models of an extended disjunctive logic program. This procedure consists in transforming an extended disjunctive logic program into a constrained disjunctive program free of negation-by-default whose set of 2-valued minimal models corresponds to the set of partial stable models of the original program. (Also cross-referenced as UMIACS-TR-95-49) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Wayne Kelly. William Pugh. Evan Rosser. Tatiana Shpeisman. Transitive Closure of Infinite Graphs and its Applications. April 1994.
Integer tuple relations can concisely summarize many types of information gathered from analysis of scientific codes. For example they can be used to precisely describe which iterations of a statement are data dependent of which other iterations. It is generally not possible to represent these tuple relations by enumerating the related pairs of tuples. For example, it is impossible to enumerate the related pairs of tuples in the relation {[i] -> [i+2] | 1 <= i <= n-2}. Even when it is possible to enumerate the related pairs of tuples, such as for the relation {[i,j] -> [i',j'] | 1 <= i,j,i',j' <= 100}, it is often not practical to do so. We instead use a closed form description by specifying a predicate consisting of affine constraints on the related pairs of tuples. As we just saw, these affine constraints can be parameterized, so what we are really describing are infinite families of relations (or graphs). Many of our applications of tuple relations rely heavily on an operation called transitive closure. Computing the transitive closure of these "infinite graphs" is very different from the traditional problem of computing the transitive closure of a graph whose edges can be enumerated. For example, the transitive closure of the first relation above is the relation {[i] -> [i'] | exists beta s.t. i'-i = 2beta and 1 <= i <= i' <= n}. As we will prove, this computation is not computable in the general case. We have developed algorithms that produce exact results in most commonly occurring cases and produce upper or lower bounds (as necessary) in the other cases. This paper will describe our algorithms for computing transitive closure and some of its applications such as determining which inter-processor synchronizations are redundant. (Also cross-referenced as UMIACS-TR-95-48) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Unified Treatment of Null Values using Constraints. Kasim S. Candan. John Grant. V.S. Subrahmanian. April 1995.
An important reality when studying relational databases is the fact that entries in relational tables may often be "missing" or only partially specified. The study of such missing information has led to a rich body of work on "null values." It was recognized early on that there are many different types of null values, each of which reflects different intuitions about why a particular piece of information is missing. Different relations (or even the same relation) could contain different types of null values; yet, very little work has been done on providing a unifying model that reasons with different types of nulls. In this paper, we use constraints to provide a unifying framework for the most common types of nulls. We show how viewing tuples containing null values of these types can be viewed as constraints, and how this leads to an algebra for null values. In particular, this algebra contains a unique operator (called the "compaction" operator) used to remove redundancies from null valued relations. We have studied various properties of this algebra. We have built a prototype implementation based on the null valued operators described here and conducted various experiments using this testbed. (Also cross-referenced as UMIACS-TR-95-47) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Anne Rose. Ben Shneiderman. Catherine Plaisant. February 1995.
Using Ethnographic Methods in the Redesign of User Interfaces. Methods for observing software users in the workplace will become increasingly important as the number of people using computers grows and developers improve existing systems. Successful redesigns rely, in part, on complete and accurate evaluations of the existing systems. Based on our evaluation experience, we have derived a set of practical guidelines to be used by designers in preparing for the evaluation, performing the field study, analyzing the data, and reporting the findings. By providing a general framework based on ethnographic research, we hope to reduce the likelihood of some common problems, such as overlooking important information and misinterpreting observations. Examples from our ongoing work with the Maryland Department of Juvenile Services are used to illustrate the proposed guidelines. (Also cross-referenced as CAR-TR-765) Human-Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Shubhendu S. Mukherjee. Shamik D. Sharma. Mark D. Hill. James R. Larus. Anne Rogers. Joel Saltz. April 1995.
Efficient Support for Irregular Applications on Distributed Memory. Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable run-time behavior makes this type of parallel program difficult to write and adversely affects run-time performance. This paper explores three issues---partitioning, mutual exclusion, and data transfer---crucial to the efficient execution of irregular problems on distributed-memory machines. Unlike previous work, we studied the same programs running in three alternative systems on the same hardware base (a Thinking Machines CM-5): the CHAOS irregular application library, Transparent Shared Memory (TSM), and eXtensible Shared Memory (XSM). CHAOS and XSM performed equivalently for all three applications. Both systems were somewhat (13%) to significantly faster (991%) than TSM. (Also cross-referenced as UMIACS-TR-95-46) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Jason Ellis. Chi Tran. Jake Ryoo. Ben Shneiderman. June 1995.
Buttons vs. menus: An exploratory study of pull-down menu selection as. Button bars are a relatively new interaction method intended to speed up application use as compared to pull-down menus. This exploratory study compares three command selection methods: pull-down menus, button bars, and user choice of pull-down menus or button bars. Effectiveness was measured in two ways: speed of selection and error rate. 15 participants performed 15 word processor related tasks. Results show that in frequently used functions, such as character attribute selection (bold, italic, u nderline, etc.), button bars are faster. There were no statistically significant differences in error rates between the three interaction methods. (Also cross-referenced as CAR-TR-764) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Ben Shneiderman. Richard Chimera. Ninad Jog. Ren Stimart. David White. May 1995.
Evaluating spatial and textual style of displays. The next generation of Graphic User Interfaces (GUIs) will offer rapid access to perceptually-rich, information abundant, and cognitively consistent interfaces. These new GUIs will be subjected to usability tests and expert reviews, plus new analysis methods and novel metrics to help guide designers. We have developed and tested first generation concordance tools to help developers to review terminology, capitalization, and abbreviation. We have also developed a dialog box summary table to help deve lopers spot patterns and identify possible inconsistencies in layout, color, fonts, font size, font style, and ordering of widgets. In this study we also explored the use of metrics such as widget counts, balance, alignment, density, and aspect ratios to provide further clues about where redesigns might be appropriate. Preliminary experience with several commercial projects is encouraging. Also cross-referenced as CAR-TR-763 Also cross-referenced as ISR-TR-95-51 Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research, General Electric Information Service, Rockville, MD,
David A. Bader. Joseph Ja'Ja'. David Harwood. Larry S. Davis. May 1995.
Parallel Algorithms for Image Enhancement and Segmentation by. Institute for Advanced Computer Studies,, This paper presents efficient and portable implementations of a useful image enhancement process, the Symmetric Neighborhood Filter (SNF), and an image segmentation technique which makes use of the SNF and a variant of the conventional connected components algorithm which we call delta-Connected Components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient connected components algorithm which uses a novel approach for parallel merging. The algorithms have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, Intel Paragon, and workstation clusters. Our experimental results are consistent with the theoretical analysis (and provide the best known execution times for segmentation, even when compared with machine-specific implementations.) Our test data include difficult images from the Landsat Thematic Mapper (TM) satellite data. More efficient implementations of Split-C will likely result in even faster execution times. (Also cross-referenced as UMIACS-TR-95-44.)
Gagan Agrawal. Joel Saltz. Interprocedural Compilation of Irregular Applications for Distributed. March 1995.
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present three new interprocedural optimizations: placement of scatter routines, deletion of data structures and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D compilation system as the necessary infrastructure. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes. (Also cross-referenced as UMIACS-TR-95-43) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Interprocedural Partial Redundancy Elimination and its Application to. Gagan Agrawal. Joel Saltz. Raja Das. March 1995.
Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination. In this paper we address the problem of performing this optimization interprocedurally. We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines. (Also cross-referenced as UMIACS-TR-95-42) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Validation of Object-Oriented Design Metrics as Quality Indicators. Victor R. Basili. Lionel Briand. Walcelio L. Melo. May 1995.
This paper presents the results of a study conducted at the University of Maryland in which we experimentally investigated the suite of Object-Oriented (OO) design metrics introduced by [Chidamber&Kemerer, 1994]. In order to do this, we assessed these metrics as predictors of fault-prone classes. This study is complementary to [Li&Henry, 1993] where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known OO analysis/design method and the C++ programming language. Based on experimental results, the advantages and drawbacks of these OO metrics are discussed. Several of Chidamber&Kemerer's OO metrics appear to be useful to predict class fault-proneness during the early phases of the life-cycle. We also showed that they are, on our data set, better predictors than "traditional" code metrics, which can only be collected at a later phase of the software development processes. (Also cross-referenced as UMIACS-TR-95-40) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
An On-line Variable Length Binary Encoding. Tinku Acharya. Joseph Ja'Ja'. April 1995.
We present a methodology of an on-line variable-length binary encoding of a set of integers. The basic principle of this methodology is to maintain the prefix property amongst the codes assigned on-line to a set of integers growing dynamically. The prefix property enables unique decoding of a string of elements from this set. To show the utility of this on-line variable length binary encoding, we apply this methodology to encode the LZW codes. Application of this encoding scheme significantly improves the compression achieved by the standard LZW scheme. This encoding can be applied in other compression schemes to encode the pointers using variable-length binary codes. (Also cross-referenced as UMIACS-TR-95-39) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
On the Perturbation of Schur Complements in Positive Semidefinite. G. W. Stewart. March 1995.
This note gives perturbation bounds for the Schur complement of a positive definite matrix in a positive semidefinite matrix. (Also cross-referenced as UMIACS-TR-95-38) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Numerical Methods for M/G/1 Type Queues. G. W. Stewart. March 1995.
Queues of M/G/1 type give rise to infinite embedded Markov chains whose transition matrices are upper block Hessenberg. The traditional algorithms for solving these queues have involved the computation of an intermediate matrix G. Recently a recursive descent method for solving block Hessenberg systems has been proposed. In this paper we explore the interrelations of the two methods. (Also cross-referenced as UMIACS-TR-95-37) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Manual for the CHAOS Runtime Library. Joel Saltz. Ravi Ponnusamy. Shamik D. Sharma. Bongki Moon. Yuan-Shin Hwang. Mustafa Uysal. Raja Das. March 1995.
Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial dif- ferential equations solvers) on distributed memory machines. These procedures are also designed for use in compilers for distributed memory multiprocessors. The portable CHAOS pro- cedures are designed to support dynamic data distributions and to automatically generate send and receive messsage by capturing communications patterns at runtime. (Also cross-referenced as UMIACS-TR-95-34) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Seongsoo Hong. Compiler-Assisted Scheduling for Real-Time Applications: A Static Alternative to Low-Level Tuning. March 1995.
Developing a real-time system requires finding a balance between the timing constraints and the functional requirements. Achieving this balance often requires last-minute, low-level intervention in the code modules -- via intensive hardware-based instrumentation and manual program optimizations. In this dissertation we present an automated, static alternative to this kind of human-intensive work. Our approach is motivated by recent advances in compiler technologies, which we extend to two specific issues on real-time programming, that is, feasibility and schedulability. A task is infeasible if its execution time stretches over its deadline. To eliminate such faults, we have developed a synthesis method that (1) inspects all infeasible paths, and then (2) moves instructions out of those paths to shorten the execution time. On the other hand, schedulability of a task set denotes an ability to guarantee the deadlines of all tasks in the application. This property is affected by interactions between the tasks, as well as their individual execution times and deadlines. To address the schedulability problem, we have developed a task transformation method based on program slicing. The method decomposes a task into two subthreads: the IO-handler component that must meet the original deadline, and the state-update component that can be postponed past the deadline. This delayed-deadline approach contributes to the schedulability of the overall application. We also present a new fixed-priority preemptive scheduling strategy, which yields both a feasible priority ordering and a feasible task-slicing metric. (Also cross-referenced as UMIACS-TR-95-33) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
QR Sometimes Beats Jacobi. G. W. Stewart. March 1995.
This note exhibits a symmetric matrix having a small eigenvalue that is computed accurately by the QR algorithm but not by Jacobi's method. (Also cross-referenced as UMIACS-TR-95-32) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Identifying Reordering Transformations That Minimize Idle Processor. Wayne Kelly. William Pugh. February 1995.
Compiling a sequential program for execution on a distributed memory multi-computer involves deciding how to distribute iterations across the processors and how to order the iterations on each processor. We believe that the second of these decisions hasn't been adequately addressed by previous work in this area. The goal in selecting an iteration ordering should be to minimize the idle time spent by processors waiting for messages for other processors. We show that choosing a good ordering for the iterations can be extremely importantly and that this choice is heavy dependent on the way iterations are distributed. We show that existing approaches to this problem can produce results that are far from optimal. We will also describe analysis techniques that allow us to predict how good iteration orderings will be with respect to particular distributions of iterations. (Also cross-referenced as UMIACS-TR-95-31) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Unifying Framework for Iteration Reordering Transformations. Wayne Kelly. William Pugh. February 1995.
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. We also provide algorithms to test the legality of mappings, and to generate optimized code for mappings. (Also cross-referenced as UMIACS-TR-95-30) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bongki Moon. Mustafa Uysal. Joel Saltz. Index Translation Schemes for Adaptive Computations. February 1995.
Current research in parallel programming is focused on closing the gap between globally indexed algorithms and the separate address spaces of processors on distributed memory multicomputers. A set of index translation schemes have been implemented as a part of CHAOS runtime support library, so that the library functions can be used for implementing a global indez space across a collection of separate local index spaces. These schemes include also software-cached translation schemes aimed at adaptive irregular problems as teen as a distributed translation table technique for statically irregular problems. To evaluate and demonstrate the efficiency of the softwDare-cached translation schemes, experiments have been performed with an adaptively irregular loop kernel and a full-fledped 3D DSMC code from NASA Langley on the Intel Paragon and Cray T3D. This paper also discusses and analyzes the operational conditions under which each scheme can produce optimal performance. (Also cross-referenced as UMIACS-TR-95-28) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bongki Moon. Joel Saltz. Adaptive Runtime Support for Direct Simulation Monte Carlo. February 1995.
In highly adaptive irregular problems such as many Particle-In-Cell (PICJ codes and Dimet Simulation Monte Carlo (DSMCJ codes, data access patterns may vary from time step to time step. This fluctuation may hinder efficient utilization of distributed memory parallel computers because of the resulting overhead for data redistribution and dynamic load balancing. To efficiently parallelize such adaptive irregular problems on distributed memory parallel computers, several issues such as effective methods for domain partitioning and fast data transportation must be addressed. This paper presents efficient runtime support methods for such problems. A simple one-dimensional domain partitioning method is implemented and compared with unstructured mesh partitioners such as recursive coordinate bisection and recursive inertial bisection. A remapping decision policy has been investigated for dynamic load balancing on S-dimensional DSMC codes. Performance results are presented (Also cross-referenced as UMIACS-TR-95-27) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Restoring Images Degraded by Spatially-Variant Blur. James G. Nagy. Dianne P. O'Leary. February 1995.
Restoration of images that have been blurred by the effects of a Gaussian blurring function is an ill-posed but well-studied problem. Any blur that is spatially invariant can be expressed as a convolution kernel in an integral equation. Fast and effective algorithms then exist for determining the original image by preconditioned iterative methods. If the blurring function is spatially variant, however, then the problem is more difficult. In this work we develop fast algorithms for forming the convolution and for recovering the original image when the convolution functions are spatially variant but have a small domain of support. This assumption leads to a discrete problem involving a banded matrix. We devise an effective preconditioner and prove that the preconditioned matrix differs from the identity by a matrix of small rank plus a matrix of small norm. Numerical examples are given, related to the Hubble Space Telescope Wide-Field / Planetary Camera. The algorithms that we develop are applicable to other ill-posed integral equations as well. (Also cross-referenced as UMIACS-TR-95-26) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Parallel Monte Carlo Simulation of Three-Dimensional Flow over a Flat. Robert P. Nance. Hassan Fallah-Adl. Richard G. Wilmoth. Bongki Moon. Joel Saltz. February 1995.
This paper describes a parallel implementation of the direct simulation Monte Carlo method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a favorable load balance. Performance tests are conducted using the code to evaluate various remapping and remappinginterval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen JYow over a finite-thickness fiat plate. It will be shown that the parallel algorithm produces results which are very similar to previous DSMC results, despite the increased resolution available. However, it yields significantly faster execution times than the scalar code, as well as very good load-balance and scalability characteristics. (Also cross-referenced as UMIACS-TR-95-25) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
An Analysis of Error in a Reuse-Oriented Development Environment. William M. Thomas. Alex Delis. Victor R. Basili. February 1995.
Component reuse is widely considered vital for obtaining significant improvement in development productivity. However, as an organization adopts a reuse-oriented development process,the nature of the problems in development is likely to change. In this paper, we use a measurement--based approach to better understand and evaluate an evolving reuse process. More specifically, we study the effects of reuse across seven projects in narrow domain from a single development organization. An analysis of the errors that occur in new and reused components across all phases of system development provides insight into the factors influencing the reuse process. We found significant differences between errors associated with new and various types of reused components in terms of the types of errors committed, when errors are introduced, and the effect that the errors have on the development process. (Also cross-referenced as UMIACS-TR-95-24) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Estimating the Selectivity of Spatial Queries Using the `Correlation'. Alberto Belussi. Christos Faloutsos. February 1995.
We examine the estimation of selectivities for range and spatial join queries in real spatial databases. As we have shown earlier, real point sets: (a) violate consistently the "uniformity" and "independence" assumptions, (b) can often be described as "fractals", with non-integer (fractal) dimension. In this paper we show that, among the infinite family of fractal dimensions, the so called "Correlation Dimension" D2 is the one that we need to predict the selectivity of spatial join. The main contribution is that, for all the real and synthetic point-sets we tried, the average number of neighbors for a given point of the point-set follows a power law, with D2 as the exponent. This immediately solves the selectivity estimation for spatial joins, as well as for "biased" range queries (i.e., queries whose centers prefer areas of high point density). We present the formulas to estimate the selectivity for the biased queries, including an integration constant (Kshape) for each query shape. Finally, we show results on real and synthetic point sets, where our formulas achieve very low relative errors (typically about 10%, versus 40%-100% of the uniform assumption). University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Hong Liu. Raymond E. Miller. Generalized Fair Reachability Analysis for Cyclic Protocols with. February 1995.
In this paper, we extend the generalized fair reachability notion to cyclic protocols with nondeterminism and internal transitions. By properly incorporating internal transitions into the formulation of fair progress vectors, we prove that most of the results established for cyclic protocols without nondeterminism and internal transitions still hold even if nondeterminism and internal transitions are allowed. We identify indefiniteness as a new type of logical error resulting from reachable internal execution cycles and show that indefiniteness can also be detected for the class of cyclic protocols with finite fair reachable state spaces with finite extensions. Dept. of Computer Science, Univ. of Maryland,
Hong Liu. Raymond E. Miller. Generalized Fair Reachability Analysis for Cyclic Protocols. February 1995.
In this paper, the notion of fair reachability is generalized to cyclic protocols with $n\geq 2$ machines. Substantial state reduction can be achieved via fair progress state exploration. It is shown that the fair reachable state space is exactly the set of reachable states with equal channel length. As a result, deadlock detection is decidable for ${\cal P}$, the class of cyclic protocols whose fair reachable state spaces are finite. The concept of simultaneous unboundedness is defined and the lack of it is shown to be a necessary and sufficient condition for a protocol to be in ${\cal P}$. Through finite extension of the fair reachable state space, it is also shown that detection of unspecified receptions, unboundedness, and nonexecutable transitions are all decidable for ${\cal P}$. Furthermore, it is shown that any protocol ${\cal P}$ is logically correct if and only if there is no logical error in its fair reachable state space. This study shows that for the class ${\cal P}$, our generalized fair reachability analysis technique not only achieves substantial state reduction but also maintains very competitive logical error coverage. Therefore, it is a very useful technique to prove logical correctness for a wide variety of cyclic protocols. Dept. of Computer Science, Univ. of Maryland,
Improving the Efficiency of Limited-Memory Heuristic Search. Subrata Ghosh. Ambuj Mahanti. Dana S. Nau. February 1995.
This paper describes a new admissible tree search algorithm called Iterative Threshold Search (ITS). ITS can be viewed as a much-simplified version of MA*, and a generalized version of MREC ITS's node selection and retraction (pruning) overhead is much less expensive than MA*'s. We also present the following results: 1. Every node generated by ITS is also generated by IDA*, even if ITS is given no more memory than IDA*. In addition, there are trees on which ITS generates 0(N) nodes in comparison to 0(N log N) nodes generated by IDA*, where N is the number of nodes eligible for generation by A*. 2. Experimental tests show that if the heuristic branching factor is low and the nodegeneration time is high (as in most practical problems), then ITS can provide significant savings in both number of node generations and running time. 3. Our experimental results also suggest that on the Traveling Salesman Problem, both IDA* and ITS are asymptotically optimal on the average if the costs between the cities are drawn from a fixed range. However, if the rake of costs grows in proportion to the problem size, then IDA* is not asymptotically optimal. ITS's asymptotic complexity in the latter case depends on the amount of memory available to it. (Also cross-referenced as UMIACS-TR-95-23) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ambuj Mahanti. S. Ghosh. Dana S. Nau. A. K. Pal. L.N. Kanal. On the Asymptotic Performance of IDA*. February 1995.
Since best-first search algorithms such as A* require large amounts of memory, they sometimes cannot run to completion, even on problem instances of moderate size. This problem has led to the development of limited-memory search algorithms, of which the best known is IDA*. This paper presents the following results about IDA and related algorithms: The analysis of asymptotic optimality for IDA* in [10] is incorrect. There are trees satisfying the asymptotic optimality conditions given in [10] for which IDA* is not asymptotically optimal. To correct the above problem, we state and prove necessary and sufficient conditions for asymptotic optimality of IDA* on trees. On trees not satisfying our conditions, we show that no best-first limited-memory search algorithm can be asymptotically optimal. On graphs, IDA* can perform quite poorly. In particular, there are graphs on which IDA* does node expansions where N is the number of nodes expanded by A'. (Also cross-referenced as UMIACS-TR-95-22) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Improved Approximation Algorthmsor Uniform Connectivity Problems. Samir Khuller. Balaji Raghavachari. February 1995.
The problem of finding minimum weight spanning subgraphs with a given connectivity requirement is considered. The problem is NP-hard when the connectivity requirement is greater than one. Polynomial time approximation algorithms for various weighted and unweighted connectivity problems are given. The following results are presented: 1. For the unweighted k-edge-connectivity problem an approximation algorithm that achieves a performance ratio of 1.85 is described. This is the first polynomial-time algorithm that achieves a constant less than 2, for all k. 2. For the weighted vertex-connectivity problem, a constant factor approximation algorithm is given assuming that the edge-weights satisfy the triangle inequality. This is the first constant factor approximation algorithm for this problem. 3. For the case of biconnectivity, with no assumptions about the weights of the edges, an algorithm that achieves a factor asymptotically approaching 2 is described. This matches the previous best bound for the corresponding edge connectivity problem. (Also cross-referenced as UMIACS-TR-95-21) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Framework for Optimizing Parallel I/O. Robert Bennett. Kelvin S. Bryant. Alan Sussman. Raja Das. Joel Saltz. January 1996.
There has been a great deal of recent interest in parallel I/O. This paper discusses issues in the design and implementation of a portable I/O library designed to optimize the performance of multiprocessor architectures that include multiple disks or disk arrays. The major emphasis of the paper is on optimizations that are made possible by the use of collective I/O, so that I/O requests for multiple processors can be combined to improve performance. Performance measurements from benchmarking our implementation of an I/O library that currently performs collective local optimizations, called Jovian, on three application templates are also presented. Dept. of Computer Science, Univ. of Maryland,
Chialin Chang. Alan Sussman. Joel Saltz. Support for Distributed Dynamic Data Structures in C++. February 1995.
Traditionally, applications executed on distributed memory architectures in single-program multiple-data (SPMD) mode use distributed (multi-dimensional) data arrays. Good performance has been achieved by applying runtime techniques to such applications executing in a loosely synchronous manner. However, many applications utilize language constructs such as pointers to synthesize dynamic complex data structures, such as linked lists, trees and graphs, with elements consisting of complex composite data types. Existing runtime systems that rely on global indices cannot be used for these applications, as no global names or indices are imposed upon the elements of these data structures. A portable object-oriented runtime library is presented to support applications that use dynamic distributed data structures, including both arrays and pointer-based data structures. In particular, CHAOS++ deals with complex data types and pointer-based data structures by providing {\em mobile objects} and {\em globally addressable objects}. Preprocessing techniques are used to analyze communication patterns, and data exchange primitives are provided to carry out efficient data transfer. Performance results for applications taken from three distinct classes are also included to demonstrate the wide applicability of the runtime library. (Also cross-referenced as UMIACS-TR-95-19) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Charles Falkenberg. James M. Purtilo. Parallel I/O Using a Distributed Disk Cluster: An Exercise in Tailored. January 1995.
Tailored prototyping refers to an emerging process for prototyping software applications, emphasizing a disciplined experimental approach in order for developers to obtain an understanding of system characteristics before committing to costly design decisions. In our approach, the design of software constituting prototype apparatus is driven by experimental hypotheses concerning risk, rather than an application's functional requirements. This paper describes the principles behind tailored prototyping, then illustrates them in concrete terms by describing their application in a pilot project. The pilot used in our illustration is a parallel I/O service --- a mechanism designed to deliver pages, in parallel, from a cluster of distributed disks. The performance results show that this parallel I/O system can, in certain circumstances, deliver higher page throughput from multiple remote disks, than with a single local disk. The pilot project exemplifies our prototyping method which is applicable to a wide variety software prototyping activities. (Also cross-referenced as UMIACS-TR-95-18) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Implementation of the MPL Compiler. Jan M. Rizzuto. James da Silva. February 1995.
The Maruti Real-Time Operating System was developed for applications that must meet hard real-time constraints. In order to schedule real-time applications, the timing and resource requirements for the application must be determined. The development environment provided for Maruti applications consists of several stages that use various tools to assist the programmer in creating an application. By analyzing the source code provided by the programmer, these tools can extract and analyze the needed timing and resource requirements. The initial stage in development is the compilation of the source code for an application written in the Maruti Programming Language (MPL). MPL is based on the C programming language. The MPL Compiler was developed to provide support for requirement specification. This report introduces MPL and describes the implementation of the MPL Compiler. (Also cross-referenced as UMIACS-TR-95-17) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bonnie J. Dorr. Jye-hoon Lee. Clare Voss. Sungki Suh. Development of Interlingual Lexical Conceptual Structures with. February 1995.
This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. Our primary objective was to develop an interlingual representation based on lexical conceptual structure (LCS) and to examine the relation between this representation and a set of linguistically motivated semantic classes. We view the work of the past year as a critical step toward achieving our goal of building a generator: the classification of LCS's into a semantic hierarchy provides a systematic mapping between semantic knowledge about verbs and their surface syntactic structures. We have focused on several areas in support of our objectives: (1) investigation of morphological structure including distinctions between Korean and English; (2) porting a fast, message-passing parser to Korean (and to the IBM PC); (3) study of free word order and development of the associated processing algorithm; (4) investigation of the aspectual dimension as it impacts morphology, syntax, and lexical semantics; (5) investigation of the relation between semantic classes and syntactic structure; (6) development of theta-role and lexical-semantic templates through lexical acquisition techniques; (7) definition a mapping between KR concepts and interlingual representations; (8) formalization of the lexical conceptual structure (Also cross-referenced as UMIACS-TR-95-16) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Notes on Symbol Dynamics. Ashok K. Agrawala. Christopher Landauer. February 1995.
This paper introduces a new formulation of dynamic systems that subsumes both the classical discrete snd differential equation models as well as current trends in hybrid models. The key idea is to express the system dynamics using symbols to which the notion of time is explicitly attached. The state of the system is described using symbols which are active for a defined period of time. The system dynamics is then represented as relations between the symbolic representations . We describe the notation and give several examples of its use. (Also cross-referenced as UMIACS-TR-95-15) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adam A. Porter. C. A. Toman. Harvey Siy. Lawrence G. Votta. February 1995.
An Experiment to Assess the Cost-Benefits of Code Inspections in Large. We are conducting a long-term experiment (in progress) to compare the costs and benefits of several different software inspection methods. These methods are being applied by professional developers to a commercial software product they are currently writing. Because the laboratory for this experiment is a live development effort, we took special care to minimize cost and risk to the project, while maximizing our ability to gather useful data. This article has several goals: (1) to describe the experiment's design and show how we used simulation techniques to optimize it, (2) to present our preliminary results and discuss their implications for both software practitioners and researchers, and (3) to discuss how we expect to modify the experiment in order to reduce potential risks to the project. For each inspection we randomly assign 3 independent variables: (1) the number of reviewers on each inspection team (1, 2 or 4), (2) the number of teams inspecting the code unit (1 or 2), and (3) the requirement that defects be repaired between the first and second team's inspections. The reviewers for each inspection are randomly selected without replacement from a pool of 11 experienced software developers. The dependent variables for each inspection include inspection interval (elapsed time), total effort, and the defect detection rate. To date we have completed 34 of the planned 64 inspections. Our preliminary results challenge certain long-held beliefs about the most cost-effective ways to conduct inspections and raise some questions about the feasibility of recently proposed methods. (Also cross-referenced as UMIACS-TR-95-14) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Computational Capabilities of Recurrent NARX Neural Networks. Hava T. Siegelmann. Bill G. Horne. C. Lee Giles. March 1995.
Recently, fully connected recurrent neural networks have been proven to be computationally rich --- at least as powerful as Turing machines. This work focuses on another network which is popular in control applications and has been found to be very effective at learning a variety of problems. These networks are based upon Nonlinear AutoRegressive models with eXogenous Inputs (NARX models), and are therefore called {\em NARX networks}. As opposed to other recurrent networks, NARX networks have a limited feedback which comes only from the output neuron rather than from hidden states. They are formalized by \[ y(t) = \Psi \left( \rule[-1ex]{0em}{3ex} u(t-n_u), \ldots, u(t-1), u(t), y(t-n_y), \ldots, y(t-1) \right), \] where $u(t)$ and $y(t)$ represent input and output of the network at time $t$, $n_u$ and $n_y$ are the input and output order, and the function $\Psi$ is the mapping performed by a Multilayer Perceptron. We constructively prove that the NARX networks with a finite number of parameters are computationally as strong as fully connected recurrent networks and thus Turing machines. We conclude that in theory one can use the NARX models, rather than conventional recurrent networks without any computational loss even though their feedback is limited. Furthermore, these results raise the issue of what amount of feedback or recurrence is necessary for any network to be Turing equivalent and what restrictions on feedback limit computational power. (Also cross-referenced as UMIACS-TR-95-12) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Updating Disjunctive Datbases via Model Trees. John Grant. Jaroslaw Gryz. Jack Minker. February 1995.
In this paper we study the problem of updating disjunctive databases, which contain indefinite data given as positive injunctive closes. We give correct algorithms for the insertion of a clause into and the deletion of a clause from such databases. Although the algorithms presented here are oriented towards model trees, they apply to any representation of minimal models. (Also cross-referenced as UMIACS-TR-95-11) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Kelvin S. Bryant. Jon Mauney. A Parametric View of Retargetable Register Allocation. January 1995.
We discuss the problems involved in building a retargetable register allocator for use in an optimizing compiler. While the popular "register coloring" method is machine-independent, the allocator as a whole must implement numerous machine-dependent decisions. We present the kinds of information that must be parameterized in order to include register allocation in an retargetable compiler back-end, and discuss a sample solution. Dept. of Computer Science, Univ. of Maryland,
Allocation and Scheduling of Real-Time Periodic Tasks with. Sheng-Tzong Cheng. Ashok K. Agrawala. January 1995.
Allocation problem has always been one of the fundamental issues of building the applications in distributed computing systems (DCS). For real-time applications on DCS, the allocation problem should directly address the issues of task and communication scheduling. In this context, the allocation of tasks has to fully utilize the available processors and the scheduling of tasks has to meet the specified timing constraints. Clearly, the execution of tasks under the allocation and schedule has to satisfy the precedence, resources, and other synchronization constraints among them. Recently, the timing requirements of the real-time systems emerge that the relative timing constraints are imposed on the consecutive executions of each task and the inter-task temporal relationships are specified across task periods. In this paper we consider the allocation and scheduling problem of the periodic tasks with such timing requirements. Given a set of periodic tasks, we consider the least common multiple (LCM) of the task periods. Each task is extended to several instances within the LCM. The scheduling window for each task instance is derived to satisfy the timing constraints. We develop a simulated annealing algorithm as the overall control algorithm. An example problem of the sanitized version of the Boeing 777 Aircraft Information Management System is solved by the algorithm. Experimental results show that the algorithm solves the problem in a reasonable time complexity. (Also cross-referenced as UMIACS-TR-95-6) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Tool Support for Collaborative Software Prototyping. Elliot A. Shefrin. James M. Purtilo. Computer Science Department, Univ. of Maryland, December 1994.
Prototyping is a means by which requirements for software projects can be defined and refined before they are committed to firm specifications for the finished software product. By this process, costly and time-consuming errors in specification can be avoided or minimized. Reconfiguration is the concept of altering the program code, bindings between program modules, or logical or physical distribution of software components while allowing the continuing execution of the software being changed. Combining these two notions suggests the potential for a development environment where requirements can be quickly and dynamically evolved. This paper discusses reconfiguration-based prototyping (RBP), that is, the simultaneous consideration of requirements, software behavior and user feedback within a running system in order to derive a clear specification of an intended product. Tools enabling RBP can coordinate the efforts of developers, users and subject matter specialists alike as they work towards consensus on an application's specification by means of a prototype. The authors describe the scope of the modifications that can be effected by an integration of prototyping and reconfiguration protocols, and they then demonstrate that the technology exists to create such an environment. They conclude by describing a software development environment based on RBP. (Also cross-referenced as UMIACS-TR-95-5) University of Maryland Institute for Advanced Computer Studies,
Samir Khuller. Approximation Algorithms for Finding Highly Connected Subgraphs. January 1995.
(Also cross-referenced as UMIACS-TR-95-4) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Manufacturing-Operation Planning Versus AI Planning. Dana S. Nau. Satyandra K. Gupta. William C. Regli. January 1995.
Although AI planning techniques can potentially be useful in several manufacturing domains, this potential remains largely unrealized. Many of the issues important to manufacturing engineers have now seemed interesting to AI researchers -- but, in order to adapt AI planning techniques to manufacturing, it is important to address these issues in a realistic and robust manner. Furthermore, by investigating these issues, AI researchers may be able to discover principles that are relevant for AI planning in general. As an example, in this paper we describe the techniques for manufacturing- operation planning used in IMACS (Interactive Manufacturability Analysis and Critiquing System). We compare and contrast them with the techniques used in classical AI planning systems, and point out that some of the techniques used in IMACS may also be useful in other kinds of planning problems. (Also cross-referenced as UMIACS-TR-95-3) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Peter Tino. Bill G. Horne. C. Lee Giles. Finite State Machines. January 1995.
We present two approaches to the analysis of the relationship between a recurrent neural network (RNN) and the finite state machine \( {\cal M} \) the network is able to exactly mimic. First, the network is treated as a state machine and the relationship between the RNN and \( {\cal M} \) is established in the context of algebraic theory of automata. In the second approach, the RNN is viewed as a set of discrete-time dynamical systems associated with input symbols of \( {\cal M} \). In particular, issues concerning network representation of loops and cycles in the state transition diagram of \( {\cal M} \) are shown to provide a basis for the interpretation of learning process from the point of view of bifurcation analysis. The circumstances under which a loop corresponding to an input symbol \( x \) is represented by an attractive fixed point of the underlying dynamical system associated with \( x \) are investigated. For the case of two recurrent neurons, under some assumptions on weight values, bifurcations can be understood in the geometrical context of intersection of increasing and decreasing parts of curves defining fixed points. The most typical bifurcation responsible for the creation of a new fixed point is the saddle node bifurcation. (Also cross-referenced as UMIACS-TR-95-1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Measuring the Impact of Reuse on Quality and Productivity in. Walcelio L. Melo. Lionel Briand. Victor R. Basili. January 1995.
This paper presents the results of a study conducted at the University of Maryland in which we assessed the impact of reuse on quality and productivity in OO systems. Reuse is assumed to be a very effective strategy for software industry to build high-quality software. However, there is currently very little empirical information about what we can expect from reuse in terms of productivity and quality gains. This also applies to OO development which is supposed to facilitate reuse. Our experiment is one step towards a better understanding of the benefits of reuse in an OO framework, considering currently available technology. Data was collected, for four months, on the development of eight medium-size management information systems with equivalent requirements. All eight projects were developed using the Waterfall Software Engineering Life Cycle Model, an Object-Oriented (OO) design method and the C++ programming language. This study indicates significant benefits from reuse in terms of reduced defect density and rework as well as increased productivity. (Also cross-referenced as UMIACS-TR-95-2) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Automatic Analysis of Consistency Between Implementations and Requirements. Marsha Chechik. John Gannon. July 1995.
Formal methods like model checking can be used to demonstrate that safety properties of embedded systems are enforced by the system's requirements. Unfortunately, proving these properties provides no guarantee that they will be preserved in an implementation of the system. We have developed a tool, called Analyzer, which helps discover instances of inconsistency and incompleteness in implementations with respect to requirements. Analyzer uses requirements information to automatically generate properties which ensure that required state transitions appear in a model of an implementation. A model is created through abstract interpretation of an implementation annotated with assertions about values of state variables which appear in requirements. Analyzer determines if the model satisfies both automatically-generated and user-specified safety properties. This paper presents a description of our implementation of Analyzer and our experience in applying it to a small but realistic problem. (Also cross-referenced as UMIACS-TR-94-137) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Santhana Krishnamachari. Rama Chellappa. Multiresolution Gauss Markov Random Field Models. December 1994.
This paper presents multiresolution models for Gauss Markov random fields (GMRF) with applications to texture segmentation. Coarser resolution sample fields are obtained by either subsampling or local averaging the sample field at the fine resolution. Al though Markovianity is lost under such resolution transformation, coarse resolution non-Markov random fields can be effectively approximated by Markov fields. We present two techniques to estimate the GMRF parameters at coarser resolutions from the fine resolution parameters, one by minimizing the Kullback-Leibler distance and another based on local conditional distribution invariance. We show the validity of the estimators by comparing the power spectral densities of the Markov approximation and the exac t non-Markov measures. We also allude to the fact that different measures (different GMRF parameters) on the fine resolution can result in the same probability measure after subsampling and show the results for the first and second order cases. We apply this multiresolution model to texture segmentation. Different texture regions in an image are modeled by GMRFs and the associated parameters are assumed to be known. Parameters at lower resolutions are estimated from the fine resolution paramete rs. The coarsest resolution data is first segmented and the segmentation results are propagated upwards to the finer resolution. We use iterated conditional mode (ICM) minimization at all resolutions. A confidence measure is attached to the segmentation r esult at each pixel and passed on to the higher resolution. At each resolution, ICM is restricted only to pixels with low confidence measure. Our experiments with synthetic, Brodatz texture and real satellite images show that the multiresolution technique results in a better segmentation and requires lesser computation than the single resolution algorithm. (Also cross-referenced as UMIACS-TR-94-136) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Scheduling of Periodic Tasks with Relative Timing Constraints. Sheng-Tzong Cheng. Ashok K. Agrawala. December 1994.
The problem of non-preemptive scheduling of a set of periodic tasks on a single processor has been traditionally considering the ready time and deadline on each task. As a consequence, a feasible schedule finds that in each period one instance of each task starts the execution after the ready time and completes the execution before the deadline . Recently, the timing requirements of the real-time systems emerge that the relative timing constraints are imposed on the consecutive executions of each task. In this paper, we consider the scheduling problem of the periodic tasks with the relative timing constraints imposed on two consecutive executions of a task. We analyze the timing constraints and derive the scheduling window for each task instance. Based on the scheduling window, we present the time-based approach of scheduling a task instance. The task instances are scheduled one by one based on their priorities assigned by the proposed algorithms in this paper. We conduct the experiments to compare the schedulability of the algorithms. (Also cross-referenced as UMIACS-TR-94-135) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Similarity Searching in Large Image Databases. Euripides G.M. Petrakis. Christos Faloutsos. December 1994.
We propose a method to handle approximate searching by image content in large image databases. Image content is represented by attributed relational graphs holding features of objects and relationships between objects. The method relies on the assumption that a fixed number of ``labeled'' or ``expected'' objects (e.g., ``heart'', ``lungs'' etc.) are common in all images of a given application domain in addition to a variable number of ``unexpected'' or ``unlabeled'' objects (e.g., ``tumor'', ``hematoma'' etc.). The method can answer queries by example such as ``{\em find all X-rays that are similar to Smith's X-ray}''. The stored images are mapped to points in a multidimensional space and are indexed using state-of-the-art database methods (R-trees). The proposed method has several desirable properties: (a) Database search is approximate so that all images up to a pre-specified degree of similarity (tolerance) are retrieved, (b) it has no ``false dismissals'' (i.e., all images qualifying query selection criteria are retrieved) and (c) it scales-up well as the database grows. We implemented the method and ran experiments on a database of synthetic (but realistic) medical images. The experiments showed that our method significantly outperforms sequential scanning by up to an order of magnitude. (Also cross-referenced as UMIACS-TR-94-134) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David A. Bader. Joseph Ja'Ja'. December 1994.
Parallel Algorithms for Image Histogramming and Connected Components. Department of Electrical Engineering, and, This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. Our connected components algorithm uses a novel approach for parallel merging which performs drastically limited updating during iterative steps, and concludes with a total consistency update at the final step. The algorithms have been coded in Split-C and run on a variety of platforms. Our experimental results are consistent with the theoretical analysis and provide the best known execution times for these two primitives, even when compared with machine specific implementations. More efficient implementations of Split-C will likely result in even faster execution times. (Also cross-referenced as UMIACS-TR-94-133.)
FastMap: A Fast Algorithm for Indexing, Data-Mining and. Christos Faloutsos. King-Ip (David) Lin. January 1995.
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert rJag91]. Thus. we can subsequently use highly fine-tuned spatia l access methods (SAMs), to answer several types of queries, including the 'Query By Example' type (which translates to a range query); the 'all pairs' query (which translates to a spatial join [BKSS94]); the nearest-neighbor or best-match query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dissimilarities are preserved. There are two benefits from this mapping: (a) efficient retriev al, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or Sd space, revealing potential clusters, correlations among attributes and other regularities that data-mining is l ooking for. We introduce an older method from pattern recognition, namely, Multi-Dimcnsional Scaling (MDS) [Tor52]; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances an d the overall structure of the data-set. (Also cross-referenced as UMIACS-TR-94-132) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
High Performance Spatial Indexing for Parallel I/O and Centralized. Ibrahim Kamel. December 1994.
Recently, spatial databases have attracted increasing interest in the database field. Because of the volume of the data with which they deal with, the performance of spatial database systems' is important. The R-tree is an efficient spatial access method. It is a generalization of the B-tree in multidimensional space. This thesis investigates how to improve the performance of R-trees. We consider both parallel I/O and centralized architectures. For a parallel I/O environment we propose an R-tree design for a server with one CPU and multiple disks. On this architecture, the nodes of the R-tree are distributed between the different disks with cross-disk pointers ( 'Multiplezed R-tree a). When a new node is created we have to decide on which disk it will be stored. We propose and examine several criteria for choosing a disk for a new node. The most successful one, termed 'Prozimity Indew' or PI, estimates the similarity of the new node to other R-tree nodes already on a disk and chooses the disk with the least degree of similarity. For a centralized environment, we propose a new packing technique for R-trees for static databases. We use space-filling curves, and specifically the Hilbert curve, to achieve better ordering of rectangles and eventually to achieve better packing. For dynamic databases we introduce the filbert R-tree, in which every node has a well defined set of sibling nodes; we can thus use the concept of local rotation [47]. By adjusting the split policy, the Filbert R-tree can achieve a degree of space utilization as high as is desired. (Also cross-referenced as UMIACS-TR-94-131) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Analysis of the n-dimensional quadtree decomposition for arbitrary. Christos Faloutsos. H.V. Jagadish. Yannis Manolopoulos. December 1994.
We give a closed-form expression for the average number of $n$-dimensional quadtree nodes (`pieces' or `blocks') required by an $n$-dimensional hyper-rectangle aligned with the axes. Our formula includes as special cases the formulae of previous efforts for 2-dimensional spaces \cite{Faloutsos92Analytical}. It also agrees with theoretical and empirical results that the number of blocks depends on the hyper-surface of the hyper-rectangle and not on its hyper-volume. The practical use of the derived formula is that it allows the estimation of the space requirements of the $n$-dimensional quadtree decomposition. Quadtrees are used extensively in 2-dimensional spaces (geographic information systems and spatial databases in general), as well in higher dimensionality spaces (as oct-trees for 3-dimensional spaces, e.g. in graphics, robotics and 3-dimensional medical images [Arya et al., 1994]. Our formula permits the estimation of the space requirements for data hyper-rectangles when stored in an index structure like a ($n$-dimensional) quadtree, as well as the estimation of the search time for query hyper-rectangles. A theoretical contribution of the paper is the observation that the number of blocks is a piece-wise linear function of the sides of the hyper-rectangle. (Also cross-referenced as UMIACS-TR-94-130) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Project for Developing Computer Science Agenda(s) for High-Performance Computing: An Organizer's Summary. Uzi Vishkin. November 1994.
Designing a coherent agenda for the implementation of the High Performance Computing (HPC) program is a nontrivial technical challenge. Many computer science and engineering researchers in the area of HPC, who are affiliated with U.S. institutions, have been invited to contribute their agendas. We have made a considerable effort to give many in that research community the opportunity to write a position paper. This explains why we view the project as placing a mirror in front of the community, and hope that the mirror indeed reflects many of the opinions on the topic. The current paper is an organizer's summary and represents his reading of the position papers. This summary is his sole responsibility. It is respectfully submitted to the NSF. (Also cross-referenced as UMIACS-TR-94-129) Institute for Advanced Computer Studies, Univ. of Maryland, Dept. of Computer Science, Univ. of Maryland,
Manufacturing Feature Instances: Which Ones to Recognize?. Satyandra K. Gupta. William C. Regli. Dana S. Nau. November 1994.
University of Maryland, Manufacturing features and feature-based representations have become an integral part of research on manufacturing systems, largely due to their ability to model correspondences between design information and manufacturing operations. However, several research challenges still must be addressed in order to place feature technologies into a solid scientific and mathematical framework. One challenge is the issue of alternatives in feature-based planning. Even after one has decided upon an abstract set of features to use for representing manufacturing operations, the set of feature instances used to represent a complex part is by no means unique. For a complex part, many (sometimes infinitely many) different manufacturing operations can potentially be used to manufacture various portions of the partÑand thus many different feature instances can be used to represent these portions of the part. Some of these feature instances will appear in useful manufacturing plans, and others will not. If the latter feature instances can be discarded at the outset, this will reduce the number of alternative manufacturing plans to be examined in order to find a useful one. Thus, what is required is a systematic means of specifying wllich feature instances are of interest. This paper addresses the issue of alternatives by introducing the notion of primary feature instances, which we contend are sufficient to generate all manufacturing plans of interest. To substantiate our argument, we describe how various instances in the primary feature set can be used to produce the desired plans. Furthermore, we discuss how this formulation overcomes computational difficulties faced by previous work, and present some complexity results for this approach in the domain of machined parts. (Also cross-referenced as UMIACS-TR-94-127) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Feature Recognition for Interactive Applications:. William C. Regli. Satyandra K. Gupta. Dana S. Nau. May 1995.
Institute for Systems Research, Dept. of Computer Science, University of Maryland, The availability of low-cost computational power is a driving force behind the growing sophistication of CAD software. Tools designed to reduce time-consuming build-test-redesign iterations are essential for increasing engineering quality and productivity. However, automation of the design process poses many difficult computational problems. As more downstream engineering activities are being considered during the design phase, guaranteeing reasonable response times within design systems becomes problematic. Design is an interactive process and speed is a critical factor in systems that enable designers to explore and experiment with alternative ideas during the design phase. Achieving interactivity requires an increasingly sophisticated allocation of computational resources in order to perform realistic design analyses and generate feedback in real time. This paper presents our initial efforts to develop techniques to apply distributed algorithms to the problem of recognizing machining features from solid models. Existing work on recognition of features has focused exclusively on serial computer architectures. Our objective is to show that distributed algorithms can be employed on realistic parts with large numbers of features and many geometric and topological entities to obtain significant improvements in computation time using existing hardware and software tools. Migrating solid modeling applications toward a distributed computing framework enables interconnection of many of the autonomous and geographically diverse software tools used in the modern manufacturing enterprise. (Also cross-referenced as UMIACS-TR-94-126.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Nonlinear Array Dependence Analysis. William Pugh. David Wonnacott. November 1994.
Standard array data dependence techniques can only reason about linear constraints. There has also been work on analyzing some dependences involving polynomial constraints. Analyzing array data dependences in real-world programs requires handling many ``unanalyzable'' terms: subscript arrays, run-time tests, function calls. The standard approach to analyzing such programs has been to omit and ignore any constraints that cannot be reasoned about. This is unsound when reasoning about value-based dependences and whether privatization is legal. Also, this prevents us from determining the conditions that must be true to disprove the dependence. These conditions could be checked by a run-time test or verified by a programmer or aggressive, demand-driven interprocedural analysis. We describe a solution to these problems. Our solution makes our system sound and more accurate for analyzing value-based dependences and derives conditions that can be used to disprove dependences. We also give some preliminary results from applying our techniques to programs from the Perfect benchmark suite. (Also cross-referenced as UMIACS-TR-94-123) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Experiences with Constraint-based Array Dependence Analysis. William Pugh. David Wonnacott. November 1994.
Array data dependence analysis provides important information for optimization of scientific programs. Array dependence testing can be viewed as constraint analysis, although traditionally general-purpose constraint manipulation algorithms have been thought to be too slow for dependence analysis. We have explored the use of exact constraint analysis, based on Fourier's method, for array data dependence analysis. We have found these techniques can be used without a great impact on total compile time. Furthermore, the use of general-purpose algorithms has allowed us to address problems beyond traditional dependence analysis. In this paper, we summarize some of the constraint manipulation techniques we use for dependence analysis, and discuss some of the reasons for our performance results. (Also cross-referenced as UMIACS-TR-94-122) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Laurent Amsaleg. Michael J. Franklin. Olivier Gruber. Efficient Incremental Garbage Collection for Workstation/Server. November 1994.
We describe an efficient server-based algorithm for garbage collecting object-oriented databases in a workstation/server environment. The algorithm is incremental and runs concurrently with client transactions, however, it does not hold any locks on data and does not require callbacks to clients. It is fault tolerant, but performs very little logging. The algorithm has been designed to be integrated into existing OODB systems, and therefore it works with standard implementation techniques such as two-phase locking and write-ahead-logging. In addition, it supports client-server performance optimizations such as client caching and flexible management of client buffers. We describe an implementation of the algorithm in the EXODUS storage manager and present results from an initial performance study of the implementation. These results demonstrate that the introduction of the garbage collector adds minimal overhead to client operations . (Also cross-referenced as UMIACS-TR-94-121) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Swarup Acharya. Rafael Alon. Michael J. Franklin. Stanley Zdonik. Broadcast Disks: Data Management for Asymmetric Communication Environments. October 1994.
This paper proposes the use of repetitive broadcast as a way of augmenting the memory hierarchy of clients in an asymmetric communication environment. We describe a new technique called "Broadcast Disks" for structuring the broadcast in a way that provides improved performance for non-uniformly accessed data. The Broadcast Disk superimposes multiple disks spinning at different speeds on a single broadcast channel Ñin effect creating an arbitrarily fine-grained memory hierarchy. In addition to proposing and defining the mechanism, a main result of this work is that exploiting the potential of the broadcast structure requires a reevaluation of basic cache management policies. We examine several "pure" cache management policies and develop and measure implementable approximations to these policies. These results and others are presented in a set of simulation studies that substantiates the basic idea and develops some of the intuitions required to design a particular broadcast program. (Also cross-referenced as UMIACS-TR-94-120) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Property-based Software Engineering Measurement. Lionel Briand. Sandro Morasca. Victor Basili R.. October 1995.
Little theory exists in the field of software system measurement. Concepts such as complexity, coupling, cohesion or even size are very often subject to interpretation and appear to have inconsistent definitions in the literature. As a consequence, there is little guidance provided to the analyst attempting to define proper measures for specific problems. Many controversies in the literature are simply misunderstandings and stem from the fact that some people talk about different measurement concepts under the same label (complexity is the most common case). There is a need to define unambiguously the most important measurement concepts used in the measurement of software products. One way of doing so is to define precisely what mathematical properties characterize these concepts, regardless of the specific software artifacts to which these concepts are applied. Such a mathematical framework could generate a consensus in the software engineering community and provide a means for better communication among researchers, better guidelines for analysts, and better evaluation methods for commercial static analyzers for practitioners. In this paper, we propose a mathematical framework which is generic, because it is not specific to any particular software artifact, and rigorous, because it is based on precise mathematical concepts. This framework defines several important measurement concepts (size, length, complexity, cohesion, coupling). It does not intend to be complete or fully objective; other frameworks could have been proposed and different choices could have been made. However, we believe that the formalisms and properties we introduce are convenient and intuitive. In addition, we have reviewed the literature on this subject and compared it with our work. This framework contributes constructively to a firmer theoretical ground of software measurement. (Also cross-referenced as UMIACS-TR-94-119) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David Shulman. Tomas Brodsky. Nonlinear Scalespace via Hierarchical Statistical Modeling. October 1994.
Nonlinear scalespace should be based on a hierarchical statistical model of the image intensity function. This model should contain an explicit representation of the multiscale structure of edges and corners. Using this model we can have a non-ad-hoc basis for computing the parameters we need to determine how much smoothing we should do at points that appear to be edge points. We also have a basis for computing the apparent error in our scalespace calculations. Hierarchical statistical modeling is a technique that can be applied to other problems in low-level vision, but in this introductory paper we just present the application of our scalespace theory to image smoothing. (Also cross-referenced as CAR-TR-742) Department of Computer Science, University of Maryland, Center for Automation Research, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
Adlai Waksman. Azriel Rosenfeld. Sparse, Opaque Three-Dimensional Texture, 2b: Photometry. October 1994.
This paper deals with 3D textures composed of approximately planar texels distributed over a volume of space ("leafy" textures). It studies the gray level histograms of images of such textures under illumination by a compact light source. Simple models c an be used to iescribe the variation of such histograms with light source direction. In fact, the variation If real plant histograms with light source direction resembles that of synthetic histograms generated using a Phong-type reflectance model and a un iform texel orientation model, and Ignoring transmittance, interreflection, and shadows. (Also cross-referenced as CAR-TR-740) Department of Computer Science, University of Maryland, Center for Automation Research, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
Richard Gerber. Languages and Tools for Real-Time Systems: Problems, Solutions. October 1994.
This report summarizes two talks I gave at the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, which took place on June 21, 1994, in in Orlando, Florida. The workshop was held in concert with ACM SIGPLAN Conference on Programming Languages Design and Implementation. The first talk ("Statements about Real-Time: Truth or Bull?") was given in the early morning. At the behest of the workshop's organizers, its primary function was to seed the ongoing discourse and provoke some debate. Besides asking controversial questions, and positing opinions, the talk also identified some several fertile research areas that might interest PLDI attendees . The second talk ("Languages and Transformations: Some Solutions") was more technical, and it reviewed our research on program optimizations for real-time domains. However, I tried as much as possible to revisit the research problems raised in the morning talk, and present some possible approaches to them. The following paragraphs contain the text from my viewgraphs, laced with some commentary. Since so much work has been done in real-time systems - and even more in programming languages - my references are by necessity incomplete. (Also cross-referenced as UMIACS-TR-94-117) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ibrahim Matta. A. Udaya Shankar. July 15, 1995.
Z-Iteration: Efficient Estimation of Instantaneous Measures. Multiple-class multiple-resource (MCMR) systems, where a class of customers requires a set of resources, are common. These systems are often analyzed under steady-state conditions. We describe a simple numerical-analytical method, referred to as Z-iteration, to estimate instantaneous (and steady-state) probability measures of time-dependent systems. The key idea is to approximate the relationship between certain instantaneous measures by the relationship between their steady-state counterparts, and use this approximation to solve dynamic flow equations. We show the generality of the Z-iteration by applying it to an integrated communication network, a parallel database server, and a distributed batch system. Validations against exact numerical solutions and discrete-event simulations show the accuracy and computational advantages of the Z-iteration. (Also cross-referenced as UMIACS-TR-94-116.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Cengiz Alaettinoglu. Ibrahim Matta. A. Udaya Shankar. A Scalable Virtual Circuit Routing Scheme for ATM Networks. October 1994.
High-speed networks, such as ATM networks, are expected to support diverse quality-of-service (QoS) requirements, including real-time QoS. Real-time QoS is required by many applications such as voice and video. To support such service, routing protocols based on the Virtual Circuit (VC) model have been proposed. However, these protocols do not scale well to large networks in terms of storage and communication overhead. In this paper, we present a scalable VC routing protocol. It is based on the recently proposed viewserver hierarchy, where each viewserver maintains a partial view of the network. By querying these viewservers, a source can obtain a merged view that contains a path to the destination. The source then sends a request packet over this path to setup a real-time VC through resource reservations. The request is blocked if the setup fails. We compare our protocol to a simple approach using simulation. Under this simple approach, a source maintains a full view of the network. In addition to the savings in storage, our results indicate that our protocol performs close to or better than the simple approach in terms of VC carried load and blocking probability over a wide range of real-time workload. (Also cross-referenced as UMIACS-TR-94-115) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Robert Freimer. Samir Khuller. Joe Mitchell. Christine Piatko. Kathleen Romanik. Diane Souvaine. October 1994.
Localizing an object with finger probes. We consider the problem of identifying one of a set of polygonal models in the plane using point probes and finger probes. In particular, we give strategies for using a minimum number of finger probes to determine a finite number of possible locations of an unknown interior point in one of the models. A finger probe takes as input an interior point $p$ of a polygon $P$ and a direction $\theta$, and it outputs the first point of intersection of a ray emanating from $p$ in direction $\theta$ with the boundary of $P$. We show that without a priori knowledge of what the models look like, no finite number of finger probes will suffice. When the models are given in advance, we give both batch and dynamic probing strategies for solving the problem. We consider both the case where the models are aligned rectilinear polygons and the case where the models are simple polygons. (Also cross-referenced as UMIACS-TR-94-114) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Tae-Hyung Kim. James M. Purtilo. Configuration-level optimzation of RPC-based distribution programs. October 1994.
Many strategies for improving performance of distributed programs can be described abstractly in terms of an application's overall configuration. But previously those techniques would need to be implemented manually, and the resulting programs, though yielding good performance, are more expensive to build and much less easy to reuse. This paper describes research towards an automatic system for introducing performance improvement techniques based upon an application's configuration description. (Also cross-referenced as UMIACS-TR-94-113) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Lionel Briand. Walcelio L. Melo. Carolyn B. Seaman. Victor R. Basili. Characterizing and Assessing a Large-Scale Software Maintenance. November 1994.
One important component of a software process is the organizational context in which the process is enacted. This component is often missing or incomplete in current process modeling approaches. One technique for modeling this perspective is the Actor-Dependency (AD) Model. This paper reports on a case study which used this approach to analyze and assess a large software maintenance organization. Our goal was to identify the approach's strengths and weaknesses while providing practical recommendations for improvement. The AD model was found to be very useful in capturing the important properties of the organizational context of the maintenance process, and aided in the understanding of the flaws found in this process. However, a number of opportunities for extending and improving the AD model were identified. Among others, there is a need to incorporate quantitative information to complement the qualitative model. (Also cross-referenced as UMIACS-TR-94-112) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Guy Edjlali. Gagan Agrawal. Alan Sussman. Joel Saltz. Data Parallel Programming in an Adaptive Environment. September 1994.
For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing data parallel programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers to generate code for an adaptive environment. We also present performance results for a multiblock Navier-Stokes solver run on a network of workstations using PVM for message passing. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computations. (Also cross-referenced as UMIACS-TR-94-109) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Yi-Sheng Yao. Rama Chellappa. Estimation of Unstabilized Components in Vehicular Motion. September 1994.
This paper presents a kinetic-model based algorithm for estimating some unstabilized components in vehicular motion. In addition to smooth movement, there are unstabilized components such as bounce, pitch and roll in vehicular motion. To reliably accomplish other tasks like tracking and obstacle avoidance using visual inputs, it is essential to consider these disturbances. A two-wheel vehicle model available in the literature is used for this purpose. It takes into account the bouncing and pitching components. The dynamics of these unstabilized components are formulated using standard equations of motion. Assuming that depth information is known for some landmarks in the scene (e.g., obtained from a laser range finder) and additional information from inertial sensors such as accelerometers is available, a feature-based approach is proposed to estimate the unstabilized components. Simulation results for both deterministic and stochastic terrain profiles are presented. The robustness of the filter with respect to various parameter mismatches is also addressed. (Also cross-referenced as CAR-TR-735) Department of Computer Science, University of Maryland, Center for Automation Research, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
Parke Godfrey. Minimization in Cooperative Response to Failing Database Queries. September 1994.
When a query fails, it is more cooperative to identify the cause of failure, rather than just to report the empty answer set. If there is not a cause for the query's failure, it is worthwhile to report the part of the query which failed. To identify a minimal failing subquery (MFS) of the query is the best way to do this. (This MFS is not unique; there may be many of them.) Likewise, to identify a maximal succeeding subquery (MSS) can help a user to recast a new query that leads to a non-empty answer set. Database systems do not provide the functionality of these types of cooperative responses. This may be, in part, because algorithmic approaches to finding the MFSs and the MSSs to a failing query are not obvious. The search space of subqueries is large. Despite work on MFSs in the past, the algorithmic complexity of these identification problems had remained uncharted. This paper shows the complexity profile of MFS and MSS identification. It is shown that there exists a simple algorithm for finding a MFS or a MSS by asking N subsequent queries, in which N is the length of the query. To find more MFSs (or MSSs) can be hard. It is shown that to find on the order of N MFSs (or MSSs) is NP-hard. To find K MFSs (or MSSs), for a fixed K, remains polynomial. An optimal algorithm for enumerating MFSs and MSSs, ISHMAEL, is developed and presented. The algorithm has ideal performance in enumeration, finding the first answers quickly, and decaying toward intractability in a predictable manner as more answers are found. The complexity results and the algorithmic approaches given in this paper should allow for the construction of MFS and MSS facilities for database systems. These results are relevant to a number of problems outside of databases too, and may find further application. (Also cross-referenced as UMIACS-TR-94-108) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Yuan-Jye Jason Wu. A Parallel Implementation of the Block-GTH algorithm. September 2, 1994.
The GTH algorithm is a very accurate direct method for finding the stationary distribution of a finite-state, discrete time, irreducible Markov chain. O'Leary and Wu developed the block-GTH algorithm and successfully demonstrated the efficiency of the algorithm on vector pipeline machines and on workstations with cache memory. In this paper, we discuss the parallel implementation of the block-GTH algorithm and show effective performance on the CM-5.
Lionel Briand. Sandro Morasca. Victor R. Basili. A Goal-Driven Definition Process for Product Metrics Based on Properties. September 1994.
Defining product metrics requires a rigorous and disciplined approach because useful metrics depend, to a very large extent, on one's goals and assumptions about the studied software process. Unlike in more mature scientific fields, it appears difficult to devise a "universal" set of metrics in software engineering. In this paper, we propose a metric definition process which is based on both the experimental goals of measurement, expressed via the GQM paradigm, and the study of the metrics' mathematical properties. This process integrates several research contributions from the literature into a consistent and practical process. This process is intended to be a starting point for discussion about a widely accepted, practical product metric definition process in the software engineering community. (Also cross-referenced as UMIACS-TR-94-106) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Urs von Matt. G. W. Stewart. September 1994.
Rounding Errors in Solving Block Hessenberg Systems. A rounding error analysis is presented for a divide-and-conquer algorithm to solve linear systems with block Hessenberg matrices. Conditions are derived under which the algorithm computes a backward stable solution. The algorithm is shown to be stable for diagonally dominant matrices and for M-matrices. (Also cross-referenced as UMIACS-TR-94-105) Department of Computer Science and,
David Carr. Ninad Jog. Harsha Kumar. Marko Teittinen. Christopher Ahlberg. September 1994.
Using Interaction Object Graphs to Specify and Develop Graphical Widgets. This document describes five widgets that have been developed at the Human-Computer Interaction Laboratory of the University of Maryland. These widgets are a range selection slider, a two-level alpha-slider, a secure switch , a tree viewer and a treemap viewer. The last two use the same tree representation and can be used as alternate visualizations of the same hierarchy. In addition, a system for widget specification is introduced and each widget is specified using this system. (Also cross-referenced as CAR-TR-734) (Also cross-referenced as ISR-TR-94-69) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Thomas Marlowe. William Pugh. Ted Baker. Azer Bestavros. Ron Cytron. Victor Fay Wolfe. Proceedings of the ACM SIGPLAN Workshop on Language, Compiler, and Tool. August 1994.
Traditionally, optimizing compilers apply source to source transformations. This technical report contains the proceedings of the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, held in conjunction with PLDI '94 (ACM SIGPLAN Conference on Programming Language Design and Implementation) and LFP '94 (Lisp and Functional Progamming). This workshop explores the interface between two dynamic areas of computer science and engineering: programming languages and real-time systems. Directions in both fundamental and applied research in real-time computing have been changing over the last several years, in response to the need for large, flexible, powerful, and robust systems. There is a growing perception that previous approaches have been pitched at inappropriate levels for these new applications: neither low-level coding without high-level design, nor high-level specification/verification without guarantees on translation quality are satisfactory for large complex systems. Several researchers in real-time systems see language and compiler techniques as a major part of the solution; at the same time, language researchers are beginning to explore real-time applications and environments. While hard temporal constraints complicate the adaptation, the entire range of language techniques can be brought to bear on real-time systems. (Also cross-referenced as UMIACS-TR-94-104) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Cornelia Fermuller. Yiannis Aloimonos. On the Geometry of Visual Correspondence. (Also cross-refernced as CAR-TR-732) July 1994.
Computer Vision Laboratory, Image displacement fieldsÑoptical flow fields, stereo disparity fields, normal flow fieldsÑdue to rigid motion possess a global geometric structure which is independent of the scene in view. Motion vectors of certain lengths and directions are constraine d to lie on the imaging surface at particular loci whose location and form depends solely on the 3D motion parameters. If optical flow fields or stereo disparity fields are considered, then equal vectors are shown to lie on conic sections. Similarly, for normal motion fields, equal vectors lie within regions whose boundaries also constitute conics. By studying various properties of these curves and regions and their relationships, a characterization of the structure of rigid motion fields is given. The go al of this paper is to introduce a concept underlying the global structure of image displacement fields. This concept gives rise to various constraints that could form the basis of algorithms for the recovery of visual information from multiple views. Department of Computer Science, University of Maryland, Center for Automation Research,
Rama Chellappa. C.L. Wilson. S. Sirohey. C.S. Barnes. Human and Machine Recognition of Faces: A Survey. August 1994.
The goal of this paper is to present a critical survey of the literature on human and machine recognition of faces. Machine recognition of faces has several applications ranging from static matching of controlled photographs as in mugshot matching and credit card verification to surveillance video images. These applications have different constraints in terms of the complexity of their processing requirements and thus present a wide range of technical challenges. Over the last twenty years researchers in psychophysics, neural sciences and engineering, image processing, analy sis and computer vision have investigated a number of issues related to face recognition by humans and machines. The ongoing research activities have been given renewed emphasis over the last five years. The existing techniques and systems have been tested on different sets of images of varying complexities. But very little synergism exists between studies in psychophysics and Engineering literature. Most importantly, there exist no evaluation or benchmarking studies using large databases with the image quality that arises in law enforcement/commercial applications. In this paper, we first present different applications of face recognition in the law enforcement and commercial sectors. Special constraints that are present in these applications are pointed out. This is followed by a brief overview of the literature o n face recognition in the psychophysics community. We then present a detailed overview of more than twenty years of research done in the engineering community. Techniques for segmentation/location of the face, feature extraction and recognition are review ed Global transform and feature based methods using statistical, structural and neural classifiers are summarized. A brief summary of recognition using face profiles and range image data is also given. Real-time recognition from video images acquired in a cluttered scene such as an airport is probably the most challenging face recognition problem. As not much has been reported on this problem, we discuss several existing technologies in the image under standing literature that could potentially impact this problem. Given the numerous theories and techniques that are applicable to face recognition, it is clear that evaluation and benchmarking of these algorithms is crucial. We discuss relevant issues such as data collection, performance metrics and evaluation of systems and techniques. Finally, a summary and conclusions are given. The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs. (Also cross-referenced as CAR-TR-731) Department of Computer Science, University of Maryland, Center for Automation Research,
Stable Encoding of Large Finite-State Automata in Recurrent Neural. Christian W. Omlin. C. Lee Giles. December 1994.
We propose an algorithm for encoding deterministic finite-state automata (DFAs) in second-order recurrent neural networks with sigmoidal discriminant function and we prove that the languages accepted by the constructed network and the DFA are identical. The desired finite-state network dynamics is achieved by programming a small subset of all weights. A worst case analysis reveals a relationship between the weight strength and the maximum allowed network size which guarantees finite-state behavior of the constructed network. We illustrate the method by encoding random DFAs with 10, 100, and 1,000 states. While the theory predicts that the weight strength scales with the DFA size, we find the weight strength to be almost constant for all the experiments. These results can be explained by noting that the generated DFAs represent average cases. We empirically demonstrate the existence of extreme DFAs for which the weight strength scales with DFA size. (Also cross-referenced as UMIACS-TR-94-101) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Alois Ferscha. Satish K. Tripathi. August 1994.
Parallel and Distributed Simulation of Discrete Event Systems. The achievements attained in accelerating the simulation of the dynamics of complex discrete event systems using parallel or distributed multiprocessing environments are comprehensively presented. While parallel discrete event simulation (DES) governs the evolution of the system over simulated time in an iterative SIMD way, distributed DES tries to spatially decompose the event structure underlying the system, and executes event occurrences in spatial subregions by logical processes (LPs) usually assigned to different (physical) processing elements. Synchronization protocols are necessary in this approach to avoid timing inconsistencies and to guarantee the preservation of event causalities across LPs. Included in the survey are discussions on the sources and levels of parallelism, synchronous vs. asynchronous simulation and principles of LP simulation. In the context of conservative LP simulation (Chandy/Misra/Bryant) deadlock avoidance and deadlock detection/recovery strategies, Conservative Time Windows and the Carrier Nullmessage protocol are presented. Related to optimistic LP simulation (Time Warp), Optimistic Time Windows, memory management, GVT computation, probabilistic optimism control and adaptive schemes are investigated. (Also cross-referenced as UMIACS-TR-94-100) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
D.M. Gavrila. Larry S. Davis. Fast Correlation Matching in Large (Edge) Image Databases. August 1994.
Correlation-based matching methods are known to be very expensive when used on large image databases. In this paper, we will examine ways of speeding up correlation matching by phasecoded filtering. Phase coded filtering is a technique to combine multiple patterns in one filter by assigning complex weights of unit magnitude to the individual patterns and summing them up in a composite filter. Several of the proposed composite filters are based on this idea, such as the Circular Harmonic Component (CHC) f ilters and the Linear Phase Coefficient Composite (LPCC) filters. We will consider the LPCC(1) filter in isolation and examine ways to improve its performance by assigning the complex weights to the individual patterns in a non-random manner so as to maximize the SNR of the filter w.r.t. the individual patterns. In experiments on a database of 100 to 1000 edge images from the aerial-domain we examine the trade-off between the speed-up (the number of patterns combined in a filter) and unreliability (the number of resulting false matches) of the composite filter. Results indicate that for binary patterns with point densities of about 0.05 we can safely combine more than 20 patterns in the optimized LPCC(1) filter, which represents a speed-up of an order of a magnitude over the brute force approach of matching the individual patterns. (Also cross-referenced as CAR-TR-730) Department of Computer Science, University of Maryland, Center for Automation Research, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at: http://www.cfar.umd.edu:80/ftp/TRs/CVL-Reports-1994/TR3334-Gavrila.ps.Z
Integrating DFM with CAD through Design Critiquing. Satyandra K. Gupta. William C. Regli. Dana S. Nau. July 1994.
The increasing focus on design for manufacturability (DFM) in research in concurrent engineering and engineering design is expanding the scope of traditional design activities in order to identify and eliminate manufacturing problems during the design stage. Manufacturing a product generally involves many different kinds of manufacturing activities, each having different characteristics. A design that is good for one kind of activity may not be good for another, for example, a design that is easy to assemble may not be easy to machine. One obstacle to DFM is the difficulty involved in building a single system that can handle the various manufacturing domains relevant to a design. In this paper we propose an architecture for integrating CAD with DFM. As the designer creates a design multiple critiquing systems analyze its manufacturability with respect to different manufacturing domains such as machining, fixturing, assembly, and inspection. Using this analysis, each critiquing system offers advice about potential ways of improving the design and an integration module mediates conflicts among the different critiquing systems in order to provide feedback to improve the overall design. We anticipate that this approach can be used to build a multi-domain environment that will allow designers to create higher-quality products that can be more economically manufactured. This will reduce the need for redesign and reduce product cost and lead time. (Also cross-referenced as UMIACS-TR-94-96) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
C. Lee Giles. Bill G. Horne. T. Lin. August 1994.
Learning a Class of Large Finite State Machines with a Recurrent. One of the issues in any learning model is how it scales with problem size. Neural networks have not been immune to scaling issues. We show that a dynamically-driven discrete-time recurrent network (DRNN) can learn rather large grammatical inference problems when the strings of a finite memory machine (FMM) are encoded as temporal sequences. FMMs are a subclass of finite state machines which have a finite memory or a finite order of inputs and outputs. The DRNN that learns the FMM is a neural network that maps directly from the sequential machine implementation of the FMM. It has feedback only from the output and not from any hidden units; an example is the recurrent network of Narendra and Parthasarathy. (FMMs that have zero order in the feedback of outputs are called definite memory machines and are analogous to Time-delay or Finite Impulse Response neural networks.) Due to their topology these DRNNs are as least as powerful as any sequential machine implementation of a FMM and should be capable of representing any FMM. We choose to learn ``particular FMMs.\' Specifically, these FMMs have a large number of states (simulations are for $256$ and $512$ state FMMs) but have minimal order, relatively small depth and little logic when the FMM is implemented as a sequential machine. Simulations for the number of training examples versus generalization performance and FMM extraction size show that the number of training samples necessary for perfect generalization is less than that necessary to completely characterize the FMM to be learned. This is in a sense a best case learning problem since any arbitrarily chosen FMM with a minimal number of states would have much more order and string depth and most likely require more logic in its sequential machine implementation. (Also cross-referenced as UMIACS-TR-94-94) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Adam A. Porter. Lawrence G. Votta, Jr.. Victor R. Basili. Comparing Detection Methods for Software Requirements Inspections: A. April 1995.
(Also cross-referenced as UMIACS-TR-94-93) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Samir Khuller. Balaji Raghavachari. Azriel Rosenfeld. July 28, 1994.
Localization in Graphs. Navigation can be studied in a graph-structured framework in which the navigating agent (which we shall assume to be a point robot) moves from node to node of a ``graph space''. The robot can locate itself by the presence of distinctively labeled ``landmark'' nodes in the graph space. For a robot navigating in Euclidean space, visual detection of a distinctive landmark provides information about the direction to the landmark, and allows the robot to determine its position by triangulation. On a graph, however, there is neither the concept of direction nor that of visibility. Instead, we shall assume that a robot navigating on a graph can sense the distances to a set of landmarks. Evidently, if the robot knows its distances to a sufficiently large set of landmarks, its position on the graph is uniquely determined. This suggests the following problem: given a graph, what are the fewest number of landmarks needed, and where should they be located, so that the distances to the landmarks uniquely determine the robot's position on the graph? This is actually a classical problem about metric spaces. A minimum set of landmarks which uniquely determine the robot's position is called a ``metric basis'', and the minimum number of landmarks is called the ``metric dimension'' of the graph. In this paper we present some results about this problem. Our main {\em new\/} result is that the metric dimension can be approximated in polynomial time within a factor of $O(\log n)$; we also establish some properties of graphs with metric dimension 2. (Also cross-referenced as UMIACS-TR-94-92) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Don Perlis. July 1994.
An Error-Theory of Consciousness. I argue that consciousness is an aspect of an agent's intelligence, hence of its ability to deal adaptively with the world. In particular, it allows for the possibility of noting and correcting the agent's own errors. This in turn requires a robust self model as part of its world model, as well as the capability to come to see that world model as residing in its belief base (part of its self model), while then representing the actual world as possibly different, i.e., forming a new world model. This suggests particular computational mechanisms by which consciousness occurs, ones that conceivably could be discovered by neuroscientists, as well as built into artificial systems that may need such capabilities. Consciousness, then, would not be an epiphenomenon at all, but rather a key part of the functional architecture of suitably intelligent agents, hence amenable to study as much as any other architectural feature. (Also cross-referenced as UMIACS-TR-94-91) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Compiling Real-Time Programs with Timing Constraint Refinement and. Richard Gerber. Seongsoo Hong. July 1994.
We present a programming language called TCEL (Time-Constrained Event Language), whose semantics is based on time-constrained relationships between observable events. Such a semantics infers only those timing constraints necessary to achieve real-time correctness, without over-constraining the system. Moreover, an optimizing compiler can exploit this looser semantics to help tune the code, so that its worst-case execution time is consistent with its real-time requirements. In this paper we describe such a transformation system, which works in two phases. First the TCEL source code is translated into an intermediate representation. Then an instruction-scheduling algorithm rearranges selected unobservable operations, and synthesizes tasks guaranteed to respect the original event-based constraints. (Also cross-referenced as UMIACS-TR-94-90) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Kam Jim. C. Lee Giles. Bill G. Horne. May 1994.
Synaptic Noise in Dynamically-driven Recurrent Neural Networks:. There has been much interest in applying noise to feedforward neural networks in order to observe their effect on network performance. We extend these results by introducing and analyzing various methods of injecting synaptic noise into dynamically-driven recurrent networks during training. By analyzing and comparing the effects of these noise models on the error function, we found that applying a controlled amount of noise during training can improve convergence time and generalization performance. In addition, we analyze the effects of various noise parameters (additive vs. multiplicative, cumulative vs. non-cumulative, per time step vs. per sequence) and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent} to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima.Synaptic noise also enhances the error function by favoring internal representations where state nodes are operating in the saturated regions of the sigmoid discriminant function, thus improving generalization to longer sequences. We substantiate these predictions by performing extensive simulations on learning the dual parity grammar from grammatical strings encoded as temporal sequences with a second-order fully recurrent neural network. (Also cross-referenced as UMIACS-TR-94-89) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Azriel Rosenfeld. "Geometric Properties" of Sets of Lines. (Also cross-referenced as CAR-TR-724) July 1994.
Computer Vision Laboratory, When we regard the plane as a set of points, we can define various geometric properties of subsets of the plane connectedness, convexity, area, diameter, etc. It is well known that the plane can also be regarded as a set of lines. This note considers methods of defining sets (or fuzzy sets) of lines in the plane, and of defining (analogs of) "geometric properties" for such sets. Department of Computer Science, University of Maryland, Center for Automation Research,
Wayne Kelly. William Pugh. Evan Rosser. Code Generation for Multiple Mappings. December 1994.
There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the points in these new iteration spaces in the appropriate order. This problem has been fairly well-studied in the case where all statements use the same mapping. We have developed an algorithm for the less well-studied case where each statement uses a potentially different mapping. Unlike many other approaches, our algorithm can also generate code from mappings corresponding to loop blocking. We address the important trade-off between reducing control overhead and duplicating code. (Also cross-referenced as UMIACS-TR-94-87.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Kasim S. Candan. V.S. Subrahmanian. July 1994.
An Algebra and Calculus for Multidatabases with Integrity Constraints. Litwin et. al. have developed a language called MSQL for query multidatabases. Subsequently, Grant, Litwin, Roussopolous and Sellis have developed a calculus and algebra associated with MSQL that facilitates querying and interoperation in a multidatabase environment. In this paper, we build upon their framework by assuming that a set of integrity constraints must be satisfied. Even though each individual database in a multidatabase may satisfy the integrity constraints, the entire multidatabase itself may not satisfy the constraints. We propose three new data retrieval notions based on whether the constraint semantics is ``naive'', ``skeptical'' or makes ``choices.'' We propose a semantics for these operations, and develop an algebra and calculus based on these operators. We prove that the algebra can be embedded within the calculus -- however, the calculus is strictly more powerful than the algebra. We study various algebraic properties linking the newly defined operators together and show how these algebraic properties can be used for query optimization. Dept. of Computer Science, Univ. of Maryland, (Also cross-referenced as UMIACS-TR-94-86) University of Maryland Institute for Advanced Computer Studies,
Vadim Maslov. Global Value Propagation Through Value Flow Graph and Its Use in. July 1994.
As recent studies show, state-of-the-art parallelizing compilers produce no noticeable speedup for 9 out of 12 PERFECT benchmark codes, while the speedup that was reached by manually applying certain automatable techniques ranges from 10 to 50. In this paper we introduce the {\em Global Value Propagation} algorithm that unifies several of these techniques. Global propagation is performed using program abstraction called Value Flow Graph (VFG). VFG is an acyclic graph in which vertices and arcs are parametrically specified using F-relations. The distinctive features of our propagation algorithm are: (1) It propagates not only values carried by scalar variables, but also values carried by individual array elements. (2) We do not have to transform a program in order to use propagation results in program analysis. In this paper we focus on use of the VFG and global value propagation in array dataflow analysis. F-relations are used to represent values produced by uninterpreted function symbols that appear in dependence problems for non-affine program fragments. Global value propagation helps us to discover that some of these functions are in fact affine. (Also cross-referenced as UMIACS-TR-94-80) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. Gauss, Statistics, and Gaussian Elimination. July 1994.
This report gives a historical survey of Gauss's work on the solution of linear systems. (Also cross-referenced as UMIACS-TR-94-78) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. On Markov Chains with Sluggish Transients. June 1994.
In this note it is shown how to construct a Markov chain whose subdominant eigenvalue does not predict the decay of its transient. (Also cross-referenced as UMIACS-TR-94-77) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Cornelia Fermuller. Yiannis Aloimonos. Vision and Action. (Also cross-referenced as CAR-TR-722) June 1994.
Computer Vision Laboratory, Our work on Active Vision has recently focused on the computational modelling of navigational tasks, where our investigations were guided by the idea of approaching vision for behavioral systems in form of modules that are directly related to perceptual tasks. These studies led us to branch in various directions and inquire into the problems that have to be addressed in order to obtain an overall understanding of perceptual systems. In this paper we present our views about the architecture of vision syst ems, about how to tackle the design and analysis of perceptual systems, and promising future research directions. Our suggested approach for understanding behavioral vision to realize the relationship of perception and action builds on two earlier approac hes, the Medusa philosophy 13] and the Synthetic approach [15 The resulting framework calls for synthesizing an artificial vision system by studying vision corr petences of increasing complexity and at the same time pursuing the integration of the percept ual components with action and learning modules. We expect that Computer Vision research in the future will progress in tight collaboration with many other disciplines that are concerned with empirical approaches to vision, i.e. the understanding of biolo gical vision. Throughout the paper we describe biological findings that motivate computational arguments which we believe will influence studies of Computer Vision in the near future. Department of Computer Science, University of Maryland, Center for Automation Research,
Mark Rosenblum. Yaser Yacoob. Larry S. Davis. Human Emotion Recognition from Motion Using a Radial Basis Function. (Also cross-referenced as CAR-TR-721) June 1994.
Computer Vision Laboratory, In this paper a radial basis function network architecture is developed that learns the correlation between facial feature motion patterns and human emotions. We describe a hierarchical approach which at the highest level identifies emotions, at the mid level determines motions of facial features, and at the low level recovers motion directions. Individual emotion networks were trained to recognize the 'smile" and "surprise" emotions. Each network was trained by viewing a set of sequences of one emotion for many subjects. The trained neural network was then tested for retention, extrapolation and rejection ability. Success rates were about 88% for retention, 73Wo for extrapolation, and 79% for rejection. Department of Computer Science, University of Maryland, Center for Automation Research,
Howard C. Elman. June 1994.
Multigrid and Krylov Subspace Methods for the Discrete Stokes Equations}. Discretization of the Stokes equations produces a symmetric indefinite system of linear equations. For stable discretizations, a variety of numerical methods have been proposed that have rates of convergence independent of the mesh size used in the discretization. In this paper, we compare the performance of four such methods: variants of the Uzawa, preconditioned conjugate gradient, preconditioned conjugate residual, and multigrid methods, for solving several two-dimensional model problems. The results indicate that where it is applicable, multigrid with smoothing based on incomplete factorizaton is more efficient than the other methods, but typically by no more than a factor of two. The conjugate residual method has the advantages of being both independent of iteration parameters and widely applicable. (Also cross-referenced as UMIACS-TR-94-76) Dept. of Computer Science, Univ. of Maryland,
Defining and Validating High-Level Design Metrics. Lionel Briand. Sandro Morasca. Victor R. Basili. June 1994.
The availability of significant metrics in the early phases of the software development process allows for a better management of the later phases, and a more effective quality assessment when software quality can still be easily affected by preventive or corrective actions. In this paper, we introduce and compare four strategies for defining high-level design metrics. They are based on different sets of assumptions (about the design process) related to a well defined experimental goal they help reach: identify error-prone software parts. In particular, we define ratio-scale metrics for cohesion and coupling that show interesting properties. An in-depth experimental validation, conducted on large scale projects demonstrates the usefulness of the metrics we define. (Also cross-referenced as UMIACS-TR-94-75) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Cengiz Alaettinoglu. A. Udaya Shankar. June 1994.
Hierarchical Inter-Domain Routing Protocol. Traditional inter-domain routing protocols based on superdomains maintain either ``strong'' or ``weak'' ToS and policy constraints for each visible superdomain. With strong constraints, a valid path may not be found even though one exists. With weak constraints, an invalid domain-level path may be treated as a valid path. We present an inter-domain routing protocol based on superdomains, which always finds a valid path if one exists. Both strong and weak constraints are maintained for each visible superdomain. If the strong constraints of the superdomains on a path are satisfied, then the path is valid. If only the weak constraints are satisfied for some superdomains on the path, the source uses a query protocol to obtain a more detailed ``internal'' view of these superdomains, and searches again for a valid path. Our protocol handles topology changes, including node/link failures that partition superdomains. Evaluation results indicate our protocol scales well to large internetworks. (Also cross-referenced as UMIACS-TR-94-73) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Wayne Kelly. William Pugh. Finding Legal Reordering Transformations using Mappings. June 1994.
Traditionally, optimizing compilers attempt to improve the performance of programs by applying source to source transformations, such as loop interchange, loop skewing and loop distribution. Each of these transformations has its own special legality checks and transformation rules which make it hard to analyze or predict the effects of compositions of these transformations. To overcome these problems we have developed a framework for unifying iteration reordering transformations. The framework is based on the idea that all reordering transformation can be represented as a mapping from the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. An optimizing compiler would use our framework by finding a mapping that both corresponds to a legal transformation and produces efficient code. We present the mapping selection problem as a search problem by decomposing it into a sequence of smaller choices. We then characterize the set of all legal mappings by defining an implicit search tree. (Also cross-referenced as UMIACS-TR-94-71) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. Implementing an Algorithm for Solving Block Hessenberg Systems. July 1994.
This paper describes the implementation of a recursive descent method for solving block Hessenberg systems. Although the algorithm is conceptually simple, its implementation in C (a natural choice of language given the recursive nature of the algorithm and its data) is nontrivial. Particularly important is the balance between ease of use, computational efficiency, and flexibility. (Also cross-referenced as UMIACS-TR-94-70) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Toshiyuki Asahi. David Turo. Ben Shneiderman. June 1994.
Using Treemaps to visualize the Analytic Hierarchy Process. Treemaps, a visualization method for large hierarchical data spaces, are used to augment the capabilities of the Analytic Hierarchy Process (AHP) for decision-making. Two direct manipulation tools, presented metaphorically as a ÒpumpÓ and a Òhook,Ó were developed and applied to the treemap to support AHP sensitivity analysis. Users can change the importance of criteria dynamically on the two-dimensional treemap and immediately see the impact on the outcome of the decision. This fluid process dramatically speeds up exploration and provides a better understanding of the relative impact of the component criteria. A usability study with 6 subjects using a prototype AHP application showed that treemap representation was acceptable from a visualization and data operation standpoint. (Also cross-referenced as ISR-TR-94-45) (Also cross-referenced as CAR-TR-719) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
D.M. Gavrila. R-tree Index Optimization. June 1994.
The optimization of spatial indexing is an increasingly important issue considering the fact that spatial databases, in such diverse areas as geographical, CAD/CAM and image applications, are growing rapidly in size and often contain on the order of millions of items or more. This necessitates the storage of the index on disk, which has the potential of slowing down the access time significantly. In this paper, we discuss ways of minimizing the disk access frequency by grouping together data items which are close to one another in the spatial dom ain ("packing"). The data structure which we seek to optimize here is the R-tree for a given set of data objects. Existing methods of building an R-tree index based on space-filling curves (Peano, Hilbert) are computationally cheap, but they do not preserve spatial locality well, in particular when dealing with higher-dimensional data of non-zero extent. On the other hand, existing methods of packing based on all dimensions of the data, such as the several proposed dynamic R-tree insertion algorithms, do not take advantage of the fact that all the data objects are known beforehand. Furthermore, they are essentially serial in nature. In this paper, we regard packing as an optimization problem and propose an iterative method of finding a close-to-optimal solution to the packing of a given set of spatial objects in D dimensions. The method achieves a high degree of parallelism by constructing the R-tree bottomup. In experiments on data of various dimensionalities and distributions, we have found that the proposed method can significantly improve on the packing performance of the R* insertion algorithm and the Hilbert curve. It is shown that the improvements increase with the skewness of the data and, in some cases, can even amount to an order of magnitude in terms of decreased response time. (Also cross-referenced as CAR-TR-718) Department of Computer Science, University of Maryland, Center for Automation Research, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu:80/ftp/TRs/CVL-Reports-1994/TR3292-Gavrila.ps
David John Musliner. James Hendler. Ashok K. Agrawala. Edmund H. Durfee. Jay K. Strosnider. C. J. Paul. June 1994.
The Challenges of Real-Time AI. The research agendas of two major areas of computer science are converging: Artificial Intelligence (AI) methods are moving towards more realistic domains requiring real-time responses, and real-time systems are moving towards more complex applications requiring intelligent behavior. Together, they meet at the crossroads of interest in "real-time intelligent control," or "real-time AI." This subfield is still being defined by the common interests of researchers from both real-time and AI systems. As a result, the precise goals for various real-time AI systems are still in flux. This paper describes an organizing conceptual structure for current real-time AI research, clarifying the different meanings this term has acquired for various researchers. Having identified the various goals of real-time AI research, we then specify some of the necessary steps towards reaching those goals. This in turn enables us to identify promising areas for future research in both AI and real-time systems techniques. Dept. of Computer Science, Univ. of Maryland, (Also cross-referenced as UMIACS-TR-94-69) University of Maryland Institute for Advanced Computer Studies,
Harsha Kumar. Catherine Plaisant. Marko Teittinen. Ben Shneiderman. June 1994.
Visual Information Management for Network Configuration. Current network management systems rely heavily on forms in their user interfaces. the interfaces reflect the intricacies of the network hardware components but provide little support for guiding users through tasks. There is a scarcity of useful graphical visualizations and decision-support tools. We applied a task-oriented approach to design and implemented the user interface for a prototype network configuration management system. Our user interface provides mulitple overviews of the network (with potentially thousands of nodes) and the relevant configuration tasks (queries and updates). We propose a unified interface for exploration, querying, data entry and verification. Compact color-coded treemaps with dynamic queries allowing user-controlled filtering and animation of the data display proved well-suited for representing the multiple containment hierarchies in networks. Our Tree-browser applied the conventional node-link visualization of trees to show hardware containment hierarchies. Improvements to conventional scrollbar- browsers included tightly coupled overviews and detailed views. This visual interface, implemented with Galaxy and the University of Maryland Widget Library(TM), has received enthusiastic feedback from the network management community. This application-specific paper has design paradigms that should be useful to designers of varied systems. (Also cross-referenced as: CAR-TR-716) (Also cross-referenced as: ISR-TR-94-45) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Vinit Jain. and Ben Shneiderman. revised Sept. 1993.
Data structures for Dynamic Queries: An analytical and experimental evaluation. Dynamic Queries is a querying technique for doing range search on multi-key data sets. It is a direct manipulation mechanism where the query is formulated using graphical widgets and the results are displayed graphically preferably within 100 millisec onds. This paper evaluates four data structures, the multilist, the grid file, k-d tree and the quad tree used to organize data in high speed storage for dynamic queries. The effect of factors like size, distribution and dimensionality of data on the storage o verhead and the speed of search is explored. Analytical models for estimating the storage and the search overheads are presented, and verified to be correct by empirical data. Results indicate that multilists are suitable for small (few thousand points) data sets irrespective of the data distribution. For large data sets the grid files are excellent for uniformly distriubuted data, and trees are good for skewed data distributions. There was not significant difference in performance between the tree st ructures.%X additional reference numbers in the format of the next line Also cross-referenced as CAR-TR-715 Also cross-referenced as ISR-TR-94-47 Also cross-referenced as CS-TR-3133 Also cross-referenced as CAR-TR-685 Also cross-referenced as ISR-TR-93-73 Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Center for Automation Research,
Ninad Jog. Ben Shneiderman. May 1994.
Interactive Smooth Zoomming in a Starfield Information Visualization. This paper discusses the design and implementation of interactive smooth zooming of a starfield display. A starfield display is a two dimensional scatterplot of a multidimensional database where every item from the database is represented as a small colored glyph whose position is determined by its ranking along ordinal attributes of the items laid out on the axes. One way of navigating this visual information is by using a zooming tool to incrementally zoom in on the items by varying the attribute range on either axis independently - such zooming causes the glyphs to move continuously and to grow or shrink. To get a feeling of flying through the data, users should be able to track the motion of each glyph without getting distracted by flicker or large jumps - conditions that necessitate high display refresh rates and closely spaced glyphs on successive frames. Although the use of high-speed hardware can achieve the required visual effect for small databases, the twin software bottlenecks of rapidly accessing display items and constructing a new display image fundamentally retard the refresh rate. Our work explores several methods to overcome these bottlenecks, presents a taxonomy of various zooming methods and introduces a new widget, the zoom bar, that facilitates zooming. (Also cross-referenced as CAR-TR-714) (Also cross-referenced as ISR-TR-94-46) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Ben Shneiderman. Catherine Plaisant. May 1994.
The Future of Graphic User Interfaces: Personal Role Managers. Personal computer users typically manage hundreds of directories and thousands of files with hierarchically structured file managers, plus archaic cluttered-desktop window managers, and iconic representations of applications. These users must deal with the annoying overhead of window housekeeping and the greater burden of mapping their organizational roles onto the unnecessarily rigid hierarchy. An alternate approach is presented, Personal Role Manager (PRM), to structure the screen layout and the interface tools to better match the multiple roles that individuals have in an organization. Each role has a vision statement, schedule, hierarchy of tasks, set of people, and collection of documents. (Also cross-referenced as ISR-TR-94-48) (Also cross-referenced as CAR-TR-713) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Howard C. Elman. David J. Silvester. June 1994.
Fast Nonsymmetric Iterations and Preconditioning for Navier-Stokes Equations. Discretization and linearization of the steady-state Navier-Stokes equations gives rise to a nonsymmetric indefinite linear system of equations. In this paper, we introduce preconditioning techniques for such systems with the property that the eigenvalues of the preconditioned matrices are bounded independently of the mesh size used in the discretization. We confirm and supplement these analytic results with a series of numerical experiments indicating that Krylov subspace iterative methods for nonsymmetric systems display rates of convergence that are independent of the mesh parameter. In addition, we show that preconditioning costs can be kept small by using iterative methods for some intermediate steps performed by the preconditioner. (Also cross-referenced as UMIACS-TR-94-66) Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant. David Carr. Ben Shneiderman. April 1994.
Image Browsers: Taxonomy, Guidelines, and Informal Specifications. Image browsing is necessary in numerous applications. Designers have merely used two one-dimensional scroll bars or they have made ad hoc designs for a two-dimensional scroll bar. However, the complexity of two-dimensional browsing suggests that more careful analysis, design, and evaluation might lead to significant improvements. We present a task taxonomy for image browsing, suggest design features and guidelines, assess existing strategies, and introduce an informal specification technique to describe the browsers. (Also cross-referenced as CAR-TR-712) (Also cross-referenced as ISR-TR-94-47) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Michael Miller. Don Perlis. May 1994.
What experts deny, novices must understand. We consider the problem of representing the denial of default information. We show that such denials are important parts of commonsense reasoning. Moreover, their representation is not a simple matter of negating traditional representations of default information. We have found a solution by separating default information into use and trend portions. This approach may also afford a more compact way to represent defaults in general. (Also cross-referenced as UMIACS-TR-94-64) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Jennifer Elgot-Drapkin. Diana Gordon. Sarit Kraus. Michael Miller. Madhura Nirkhe. Don Perlis. May 1994.
Calibrating, Counting, Grounding, Grouping. Even an ``elementary'' intelligence for control of the physical world will require very many kinds of knowledge and ability. Among these are ones related to perception, action, and reasoning about ``near space'': that region comprising one's body and the portion of space within reach of one's effectors; chief among these are individuation and categorization of objects. These in turn are made useful in part by the additional capacities to estimate category size, change one's beliefs about categories, and form new categories or revise old categories. In this position paper we point out some issues in knowledge representation that can arise with respect to the above capacities, and suggest that the framework of ``active logics'' (see below) may be marshaled toward solutions. We will conduct our discussion in terms of learning to understand in a semantically explicit way one's own sensori-motor system and its interactions with near-space objects. (Also cross-referenced as UMIACS-TR-94-63) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Don Perlis. May 1994.
Logic for a lifetime. There has been an explosion of formal work in commonsense reasoning in the past fifteen years, but almost no significant connection with work in building commonsense reasoning systems (cognitive or otherwise). We explore the reasons, and especially the ideal formal assumption of omniscience, reviewing and extending arguments that this is irreparably out of line with the needs of any real reasoning agent. On the other hand, this exploration reveals some desiderata that might still be given useful formal treatment, but with a somewhat altered set of aims from what has motivated most formal work. The discussion is motivated by several examples of commonsense reasoning, involving change of belief in addition to the more usual arguments concerning resource limitations. Key to the entire discussion is the notion that real reasoners do not usually have the luxury of isolated problems with well-defined beginnings and endings, but rather must deal with evolving and ongoing problems and situations. (Also cross-referenced as UMIACS-TR-94-62) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Urs von Matt. January 1994.
Kassandra: The Automatic Grading System. An automatic grading system is presented for grading assignments in scientific computing. A student can interactively use this system to check the correctness of his program assignments. The grade for a correct solution is automatically recorded. This paper also considers the security problems with such an automatic grading system. (Also cross-referenced as UMIACS-TR-94-59) Institute for Advanced Computer Studies,,
Guaranteeing End-to-End Timing Constraints by Calibrating. Richard Gerber. Seongsoo Hong. Manas Saksena. May 1994.
This paper presents a comprehensive design methodology for guaranteeing end-to-end requirements of real-time systems. Applications are structured as a set of process components connected by asynchronous channels, in which the endpoints are the system's external inputs and outputs. Timing constraints are then postulated between these inputs and outputs; they express properties such as end-to-end propagation delay, temporal input-sampling correlation, and allowable separation times between updated output values. The automated design method works as follows: First the end-to-end requirements are transformed into a set of intermediate rate constraints on the tasks, and new tasks are created to correlate related inputs. The intermediate constraints are then solved by an optimization algorithm, whose objective is to minimize CPU utilization. If the algorithm fails, a restructuring tool attempts to eliminate bottlenecks by transforming the application, which is then re-submitted into the assignment algorithm. The final result is a schedulable set of fully periodic tasks, which collaboratively maintain the end-to-end constraints. (Also cross-referenced as UMIACS-TR-94-58) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David Carr. Catherine Plaisant. Hiroaki Hasegawa. May 1994.
The Design of a Telepathology Workstation: Exploring Remote Images. Dynamic telepathology uses a tele-operated microscope to allow pathologists to view samples at a remote location. However, time delays introduced by remote operation have made use of a commercial dynamic telepathology system difficult and frustrating. This paper describes experiments to evaluate and redesign the user interface. We also make recomendations for further automation to support the pathology process and increase the usefulness of the system. Copyright, 1994, by David Carr, Catherine Plaisant, and Hiroaki Hasegawa All rights reserved (Also cross-referenced as CAR-TR-708) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Run-time and Compile-time Support for Adaptive Irregular Problems. Shamik D. Sharma. Ravi Ponnusamy. Bongki Moon. Yuan-Shin Hwang. Raja Das. Joel Saltz. May 1994.
In adaptive irregular problems the data arrays are accessed via indirection arrays, and data access patterns change during computation. Implementing such problems on distributed memory machines requires support for dynamic data partitioning, efficient preprocessing and fast data migration. This research presents efficient runtime primitives for such problems. This new set of primitives is part of the CHAOS library. It subsumes the previous PARTI library which targeted only static irregular problems. To demonstrate the efficacy of the runtime support, two real adaptive irregular applications have been parallelized using CHAOS primitives: a molecular dynamics code (CHARMM) and a particle-in-cell code (DSMC). The paper also proposes extensions to Fortran D which can allow compilers to generate more efficient code for adaptive problems. These language extensions have been implemented in the Syracuse Fortran 90D/HPF prototype compiler. The performance of the compiler parallelized codes is compared with the hand parallelized versions. (Also cross-referenced as UMIACS-TR-94-55) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. On Graded QR Decompositions of Products of Matrices. May 1994.
This paper is concerned with the singular values and vectors of a product $M_{m}=A_{1}A_{2}\cdots A_{m}$ of matrices of order $n$. The chief difficulty with computing them from directly from $M_{m}$ is that with increasing $m$ the ratio of the small to the large singular values of $M_{m}$ may fall below the rounding unit, so that the former are computed inaccurately. The solution proposed here is to compute recursively the factorization $M_{m} = QRP\trp$, where $Q$ is orthogonal, $R$ is a graded upper triangular, and $P\trp$ is a permutation. (Also cross-referenced as UMIACS-TR-94-53) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bradford G. Nickerson. April, 1994.
Skip List Data Structures for Multidimensional Data. May 1994.
This report presents four new data structures for multidimensional data. All of these data structures are based on the deterministic skip list. Explanations are provided for the 2-d search skip list and three different versions of the k-d skip list. These structures support fast insertion and deletion. The third version of the k-d skip list and the 2-d search skip list require only O(n) space. The 2-d search skip list allows semi-infinite range searches of type ([L1:H1],[L2:infinity]), or of type ([L1:H1],[-infinity:H2]) in time O(t + log n). The third version of the k-d skip list seems well-suited for range search using parallel processing. Algorithms for building, insertion, deletion and range search for all four data structures are given, along with proofs of worst case complexity for these operations. Complete C code for range search, insertion and deletion in the 2-d search skip list is also presented. (Also cross-referenced as UMIACS-TR-94-52) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Sarit Kraus. April 1994.
Contracting Tasks in Multi-Agent Environments. Agents may contract some of their tasks to other agent even when they do not share a common goal. An agent may try to contract some of the tasks that it cannot perform by itself, or that may be performed more efficiently by other agents. One self-motivated agent may convince another self-motivated agent to help it with its task, by promises of rewards, even if the agents are not assumed to be benevolent. We propose techniques that provide efficient ways to reach contracting in varied situations: the agents have full information about the environment and each other or subcontracting when the agents do not know the exact state of the world. We consider situations of repeated encounters, cases of asymmetric information, situations where the agents lack information about each other, and cases where an agent subcontracts a task to a group of agents. Situations where there is competition among possible contracted agents or possible contracting agents are also considered. In all situations we would like the contracted agent to carry out the task efficiently without the need of close supervision by the contracting agent. (Also cross-referenced as UMIACS-TR-94-44) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chia-Mei Chen. Satish K. Tripathi. Multiprocessor Priority Ceiling Based Protocols. April 7, 1994.
We study resource synchronization in multiprocessor hard real-time systems. Specifically, we propose a multiprocessor resource control protocol which allows a job to simultaneously lock multiple global resources, removing a restriction from previous protocols. Allowing nested critical sections may permit a finer granularity of synchronization, increasing parallelism and throughput. All the protocols discussed belong to the class of priority inheritance protocols and rely in some fashion on priority ceilings for global semaphores. We consider both static and dynamic priorities, building upon the multiprocessor priority ceiling protocol (MPCP) proposed by Rajkumar et al. and the dynamic priority ceiling protocol (DPCP) proposed by Chen and Lin. The extended protocols prevent deadlock and transitive blocking. We derive bounds for worse case blocking time, and describe sufficient conditions to guarantee that m sets of periodic tasks can be scheduled on an rn multiprocessor system. Performance comparisons of these protocols with MPCP shows that the proposed protocols increase schedulability. (Also cross-referenced as UMIACS-TR-94-42) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
William Pugh. David Wonnacott. Static Analysis of Upper and Lower Bounds on Dependences and Parallelism. March 1994.
Existing compilers often fail to parallelize sequential code, even when a program can be manually transformed into parallel form by a sequence of well-understood transformations (as is the case for many of the Perfect Club Benchmark programs). These failures can occur for several reasons: the code transformations implemented in the compiler may not be sufficient to produce parallel code, the compiler may not find the proper sequence of transformations, or the compiler may not be able to prove that one of the necessary transformations is legal. When a compiler extract sufficient parallelism from a program, the programmer extract additional parallelism. Unfortunately, the programmer is typically left to search for parallelism without significant assistance. The compiler generally does not give feedback about which parts of the program might contain additional parallelism, or about the types of transformations that might be needed to realize this parallelism. Standard program transformations and dependence abstractions cannot be used to provide this feedback. In this paper, we propose a two step approach for the search for parallelism in sequential programs: We first construct several sets of constraints that describe, for each statement, which iterations of that statement can be executed concurrently. By constructing constraints that correspond to different assumptions about which dependences might be eliminated through additional analysis, transformations and user assertions, we can determine whether we can expose parallelism by eliminating dependences. In the second step of our search for parallelism, we examine these constraint sets to identify the kinds of transformations that are needed to exploit scalable parallelism. Our tests will identify conditional parallelism and parallelism that can be exposed by combinations of transformations that reorder the iteration space (such as loop interchange and loop peeling). This approach lets us distinguish inherently sequential code from code that contains unexploited parallelism. It also produces information about the kinds of transformations that will be needed to parallelize the code, without worrying about the order of application of the transformations. Furthermore, when our dependence test is inexact, we can identify which unresolved dependences inhibit parallelism by comparing the effects of assuming dependence or independence. We are currently exploring the use of this information in programmer-assisted parallelization. (Also cross-referenced as UMIACS-TR-94-40) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Madhura Nirkhe. Sarit Kraus. Don Perlis. March 1994.
Thinking takes time: a modal active-logic for reasoning in time. Most common sense reasoning formalisms do not account for the passage of time a s the reasoning occurs, and hence are inadequate from the point of view of modeling an agent's {\em ongoing} process of reasoning. We present a modal active-logic that treats time as a valuable resource that is consumed in each step of the agent's reasoning. We provide a sound and complete characterization for this logic and examine how it addresses the problem of logical omniscience. (Also cross-referenced as UMIACS-TR-94-39) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Michael Miller. Don Perlis. March 1994.
Presentations and this and that: logic in action. The tie between linguistic entities (e.g., words) and their meanings (e.g., objects in the world) is one that a reasoning agent had better know about and be able to alter when occasion demands. This has a number of important commonsense uses. The formal point, though, is that a new treatment is called for so that rational behavior via a logic can measure up to the constraint that it be able to change usage, employ new words, change meanings of old words, and so on. Here we do not offer a new logic per se; rather we borrow an existing one (step logic) and apply it to the specific issue of language change. (Also cross-referenced as UMIACS-TR-94-36) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Amalgamating Knowledge Bases, III - Algorithms, Data Structures, and. Sibel Adali. V.S. Subrahmanian. March 1994.
Integrating knowledge from multiple sources is an important aspect of automated reasoning systems. In the first part of this series of papers, we presented a uniform declarative framework, based on annotated logics, for amalgamating multiple knowledge bases when these knowledge bases (possibly) contain inconsistencies, uncertainties, and non-monotonic modes of negation. We showed that annotated logics may be used, with some modifications, to mediate between different knowledge bases. The multiple knowledge bases are amalgamated by embedding the individual knowledge bases into a lattice. In this paper, we briefly describe an SLD-resolution based proof procedure that is sound and complete w.r.t. our declarative semantics. We will then develop an OLDT -resolution based query processing procedure, MULTI-OLDT , that satisfies two important properties: (1) efficient reuse of previous computations is achieved by maintaining a table -- we describe the structure of this table and show that table operations can be efficiently executed, and (2) approximate, interruptable query answering is achieved, i.e. it is possible to obtain an ``intermediate, approximate'' answer from the query processing procedure by interrupting it at any point in time during its execution. The design of the MULTI-OLDT procedure will include the development of run-time algorithms to incrementally and efficiently update the table. (Also cross-referenced as UMIACS-TR-94-35) Department of Computer Science, Univ. of Maryland,
Kutluhan Erol. James Hendler. Dana S. Nau. March 1994.
Complexity Results for HTN Planning. (Also cross-referenced as ISR-TR-95-10) Most practical work on AI planning systems during the last fifteen years has been based on hierarchical task network (HTN) decomposition, but until now, there has been very little analytical work on the properties of HTN planners. This paper describes how the complexity of HTN planning varies with various conditions on the task networks, and how it compares to STRIPS-style planning. (Also cross-referenced as UMIACS-TR-94-32) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Kutluhan Erol. James Hendler. Dana S. Nau. March 1994.
Semantics for HTN Planning. (Also cross-referenced as ISR-TR-95-9) One big obstacle to understanding the nature of hierarchical task network (HTN) planning has been the lack of a clear theoretical framework. In particular, no one has yet presented a clear and concise HTN algorithm that is sound and complete. In this paper, we present a formal syntax and semantics for HTN planning. Based on this syntax and semantics, we are able to define an algorithm for HTN planning and prove it sound and complete. We also develop several definitions of expressivity for planning languages and prove that HTN Planning is strictly more expressive than STRIPS-style planning according to those definitions. (Also cross-referenced as UMIACS-TR-94-31) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. On the Stability of Sequential Updates and Downdates. March 1994.
The updating and downdating of QR decompositions has important applications in a number of areas. There is essentially one standard updating algorithm, based on plane rotations, which is backwards stable. Three downdating algorithms have been treated in the literature: the LINPACK algorithm, the method of hyperbolic transformations, and Chambers' algorithm. Although none of these algorithms is backwards stable, the first and third satisfy a relational stability condition. In this paper, it is shown that relational stability extends to a sequence of updates and downdates. In consequence, other things being equal, if the final decomposition in the sequence is well conditioned, it will be accurately computed, even though intermediate decompositions may be almost completely inaccurate. These results are also applied to the two-sided orthogonal decompositions, such as the URV decomposition. (Also cross-referenced as UMIACS-TR-94-30) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ibrahim Matta. A. Udaya Shankar. May 1995.
Fast Time-Dependent Evaluation of Integrated Services Networks. We present a numerical-analytical method to evaluate integrated services networks with adaptive routing, scheduling and admission controls. We apply our method to connection-oriented networks supporting different types of real-time connections. The network dynamics is described by difference equations which can be solved for both transient and steady-state performances. Results indicate that our method is computationally much cheaper than discrete-event simulation, and yields accurate performance measures. We compare the performance of different routing schemes on the NSFNET backbone topology with a weighted fair-queueing link scheduling discipline and admission control based on bandwidth reservation. We show that the routing scheme that routes connections on paths which are both under-utilized and short (in number of hops) gives higher network throughput. (Also cross-referenced as UMIACS-TR-94-28) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
William Pugh. Counting Solutions to Presburger Formulas: How and Why. Dept. of Computer Science, Univ. of Maryland, April 1993.
We describe methods that are able to count the number of integer solutions to selected free variables of a Presburger formula, or sum a polynomial over all integer solutions of selected free variables of a Presburger formula. This answer is given symbolically, in terms of symbolic constants (the remaining free variables in the Presburger formula). For example, we can create a Presburger formula who's solutions correspond to the iterations of a loop. By counting these, we obtain an estimate of the execution time of the loop. In more complicated applications, we can create Presburger formulas who's solutions correspond to the distinct memory locations or cache lines touched by a loop, the flops executed by a loop, or the array elements that need to be communicated at a particular point in a distributed computation. By counting the number of solutions, we can evaluate the computation/memory balance of a computation, determine if a loop is load balanced and evaluate message traffic and allocate message buffers. (Also cross-referenced as UMIACS-TR-94-27) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Bonnie J. Dorr. March 1994.
Development of Cross-Linguistic Syntactic and Semantic Parameters for Parsing and Generation. This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. The translation approach adopted here is interlingual i.e., a single underlying representation called Lexical Conceptual Structure (LCS) is used for both Korean and English. The primary focus of this investigation concerns the notion of `parameterization' i.e., a mechanism that accounts for both syntactic and lexical-semantic distinctions between Korean and English. We present our assumptions about the syntactic structure of Korean-type languages vs. English-type languages and describe our investigation of syntactic parameterization for distinguishing between these two types of languages. We also present the details of the LCS structure and describe how this representation is parameterized so that it accommodates both languages. We address critical issues concerning interlingual machine translation such as locative postpositions and the dividing line between the interlingua and the knowledge representation. Difficulties in translation and transliteration of Korean are discussed and complex morphological properties of Korean are presented. Finally, we describe recent work on lexical acquisition and conclude with a discussion about two hypotheses concerning semantic classification that are currently being tested. (Also cross-referenced as UMIACS-TR-94-26) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Don Perlis. March 18, 1994.
Consciousness and complexity: the cognitive quest. March 1994.
Some implications of the view that mind is a suitably complex kind of process are investigated in various contexts. The underlying theme is that the behavior of complex systems cannot be adequately judged by that of simple systems. I first present a personal exploration of the mechanistic account of mind in terms of non-technical considerations; then I present and criticize some ideas of Kripke, Nagel, and Jackson that challenge the mechanistic view. Next I turn to a brief synopsis of some of Dennett's recent ideas. Finally I offer some critical comments on Dennett's views and suggest possible modifications. (Also cross-referenced as UMIACS-TR-94-25) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
C. Lee Giles. Mark W. Goudreau. February 1994.
Routing in Optical Multistage Interconnection Networks: a Neural Network. NEC Resesearch Institute, Princeton, NJ and, There has been much interest in using optics to implement computer interconnection networks. However, there has been little discussion of any routing methodologies besides those already used in electronics. In this paper, a neural network routing methodology is proposed that can generate control bits for an optical multistage interconnection network (OMIN). Though we present no optical implementation of this methodology, we illustrate its control for an optical interconnection network. These OMINs may be used as communication media for shared memory, distributed computing systems.The routing methodology makes use of an Artificial Neural Network (ANN) that functions as a parallel computer for generating the routes. The neural network routing scheme may be applied to electrical as well as optical interconnection networks.However, since the ANN can be implemented using optics, this routing approach is especially appealing for an optical computing environment. The parallel nature of the ANN computation may make this routing scheme faster than conventional routing approaches, especially for OMINs that are irregular. Furthermore, the neural network routing scheme is fault-tolerant. Results are shown for generating routes in a 16 times 16, 3 stage OMIN. (Also cross-referenced as UMIACS-TR-94-21.)
Mark W. Goudreau. C. Lee Giles. February 1994.
Using Recurrent Neural Networks to Learn the Structure of. Department of Computer Science,, A modified Recurrent Neural Network (RNN) is used to learn a Self-Routing Interconnection Network (SRIN) from a set of routing examples. The RNN is modified so that it has several distinct initial states. This is equivalent to a single RNN learning multiple different synchronous sequential machines. We define such a sequential machine structure as augmented and show that a SRIN is essentially an Augmented Synchronous Sequential Machine (ASSM). As an example, we learn a small six-switch SRIN. After training we extract the network's internal representation of the ASSM and corresponding SRIN. (Also cross-referenced as UMIACS-TR-94-20.)
(Also cross-referenced as CAR-TR-703) February 1994.
Recognition by Functional Parts. Ehud Rivlin. Sven J. Dickinson. Azriel Rosenfeld. Department of Computer Science, University of Maryland, Center for Automation Research, We present an approach to function-based object recognition that reasons about the functionality of an object's intuitive parts. We extend the popular "recognition by parts" shape recognition framework to support "recognition by functional parts", by com bining a set of functional primitives and their relations with a set of abstract volumetric shape primitives and their relations. Previous approaches have relied on more global object features, often ignoring the problem of object segmentation and thereby restricting themselves to range images of unoccluded scenes. We show how these shape primitives and relations can be easily recovered from superquadric ellipsoids which, in turn, can be recovered from either range or intensity images of occluded scenes. Furthermore, the proposed framework supports both unexpected (bottom-up) object recognition and expected (top-down) object recognition. We demonstrate the approach on a simple domain by recognizing a restricted class of hand-tools from 2-D images.
(Also cross-referenced as CAR-TR-702) February 1994.
The Quadric Reference Surface: Applications in Registering Views of. Amnon Shashua. Sebastian Toelg. The theoretical component of this work involves the following question: Given any two views of some unknown textured opaque quadric surface in 3D, is there a finite number of corresponding points across the two views that uniquely determine all other correspondences coming from points on the quadric? A constructive answer to this question is then used to propose a transformation, which we call a nominal quadratic transformation, that can be used in practice to facilitate the process of achieving full point-to-point correspondence between two grey-level images of the same (arbitrary) object. Department of Computer Science, University of Maryland, Center for Automation Research,
David M. Mount. Ruth Silverman. Minimum Enclosures with Specified Angles. February 1994.
Given a convex polygon P, an m-envelope is a convex m-sided polygon that contains P. Given any convex polygon P, and any sequence of m > 3 angles A = ((11Xct2X@..ckm) we consider the problem of computing the minimum area m-envelope for P whose counte rclockwise sequence of exterior angles is given by A. We show that such envelopes can be computed in O(nm log m) time. The main result on which the correctness of the algorithm rests is a flushness condition stating that for any locally minimum enclosure with specified angles, one of its sides must be collinear with one of the sides of P. (Also cross-referenced as CAR-TR-701) Department of Computer Science, University of Maryland, Center for Automation Research,
Hemant Singh. Rama Chellappa. An Improved Shape from Shading Algorithm. (Also cross-referenced as CAR-TR-700) February 1994.
We propose an improved shape from shading (SFS) algorithm which is an extension of the recently published algorithm by Zheng and Chellappa [13]. A markedly more accurate estimate of the azimuth of the illumination source is presented. Depth reconstructio n has been improved upon by using a new set of boundary conditions and adapting a more sophisticated technique for hierarchical implementation of the SFS algorithm. Errors at the boundaries of images and in rotation of the reconstructed images have been c orrected. Typical results on synthetic and real images are presented. Department of Computer Science, University of Maryland, Center for Automation Research,
Compiler Support for Real-Time Programs. Richard Gerber. Seongsoo Hong. January 1994.
We present a compiler-based approach to automatically assist in constructing real-time systems. In this approach, source programs are written in TCEL (or Time Constrained Event Language) which possesses high-level timing constructs, and whose semantics characterizes time-constrained relationships between observable events. A TCEL program infers only those timing constraints necessary to achieve real-time correctness, without over-constraining the system. We exploit this looser semantics to help transform programs to automatically achieve schedulability. In this article we present two such transformations. The first is trace-scheduling, which we use to achieve consistency between a program's worst-case execution time and its real-time requirements. The second is program-slicing, which we use to automatically tune application programs driven by rate-monotonic scheduling. (Also cross-referenced as UMIACS-TR-94-15) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Samir Khuller. Balaji Raghavachari. Neal Young. January 1994.
On Strongly Connected Digraphs with Bounded Cycle Length. The MEG (minimum equivalent graph) problem is "Given a directed graph, find a smallest subset of the edges that maintains all reachability relations between nodes." We consider the complexity of this problem as a function of the maximum cycle length C in the graph. If C =2, the problem is trivial. Recently it was shown that even with the restriction C = 5, the problem is NP-hard. It was conjectured that the problem is solvable in polynomial time if C =3. In this paper we prove the conjecture, showing that the problem is equivalent to maximum bipartite matching. The strong dependence of the complexity on the cycle length is in marked contrast to the relation of complexity and cycle length in undirected graphs. Undirected graphs with bounded cycle length have bounded tree width, allowing polynomial-time algorithms for many problems that are NP-hard in general. A consequence of our result is an improved approximation algorithm for the MEG problem in general graphs. The improved algorithm has a performance guarantee of about 1.61; the best previous algorithm has a performance guarantee of about 1.64. (Also cross-referenced as UMIACS-TR-94-10) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Urs von Matt. September 1994.
The Orthogonal QD-Algorithm. The orthogonal qd-algorithm is presented to compute the singular value decomposition of a bidiagonal matrix. This algorithm represents a modification of Rutishauser's qd-algorithm, and it is capable of determining all the singular values to high relative precision. A generalization of the Givens transformation is also introduced, which has applications besides the orthogonal qd-algorithm. The shift strategy of the orthogonal qd-algorithm is based on Laguerre's method, which is used to compute a lower bound for the smallest singular value of the bidiagonal matrix. Special attention is devoted to the numerically stable evaluation of this shift. (Also cross-referenced as UMIACS-TR-94-9.1) Institute for Advanced Computer Studies,,
Christine R. Hofmeister. Dynamic Reconfiguration of Distributed Applications. January 1994.
Applications requiring concurrency or access to specialized hardware are naturally written as distributed applications, where each software component (module) can execute on a different machine, and modules interact via bindings. In order to make changes to very long-running applications or those that must be continuously availablet we must dynamically change the application. Dynamic reconfiguration of a distributed application is the act of changing the configuration of the application as it executes. Examples of configuration changes are replacing a module, moving a module to another machine, and adding or removing modules from the application. The most challenging aspect of dynamic reconfiguration is that an application in execution has state information, both within the modules and within the communication channels between modules. This state information may need to be transferred from the old configuration to the new in order to reach an application state compatible with the new configuration. Thus, in addition to requiring a mechanism for changing the configuration during execution, dynamic reconfiguration requires that modules be able to divulge and install state information, and requires a mechanism for coordinating the communication during recon figuration. Prior to this work, all systems supporting some form of dynamic reconfiguration have given the application programmer no support nor even guidelines for capturing and restoring an application's state information. We have developed a machine-in dependent method for installing this functionality in the application, given a set of reconfiguration points designated by the programmer. This new technique has been implemented as part of the general framework we have developed to support dynamic reconf iguration of distributed applications. These reconfiguration capabilities were implemented on top of existing operating systems and compilers, requiring no modifications to either. They support dynamic reconfiguration for applications composed of mixed languages, communicating via message passing, running on a heterogeneous distributed platform. (Also cross-referenced as UMIACS-TR-94-8) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Joseph Ja'Ja'. Kwan Woo Ryu. January 1994.
The Block Distributed Memory Model. Institute for Advanced Computer Studies, and, We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reflecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching. We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity. (Also cross-referenced as UMIACS-TR-94-5.)
Samir Khuller. Balaji Raghavachari. Neal Young. January 1994.
Low Degree Spanning Trees of Small Weight. Given n points in the plane, the degree-K spanning tree problem asks for a spanning tree of minimum weight in which the degree of each vertex is at most K. This paper addresses the problem of computing low-weight degree-K spanning trees for K>2. It is shown that for an arbitrary collection of n points in the plane, there exists a spanning tree of degree three whose weight is at most 1.5 times the weight of a minimum spanning tree. It is shown that there exists a spanning tree of degree four whose weight is at most 1.25 times the weight of a minimum spanning tree. These results solve open problems posed by Papadimitriou and Vazirani. Moreover, if a minimum spanning tree is given as part of the input, the trees can be computed in O(n) time. The results are generalized to points in higher dimensions. It is shown that for any d [greater than or equal to] 3, an arbitrary collection of points in DimD contains a spanning tree of degree three, whose weight is at most 5/3 times the weight of a minimum spanning tree. This is the first paper that achieves factors better than two for these problems. (Also cross-referenced as UMIACS-TR-94-1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David M. Mount. Ruth Silverman. Angela Y. Wu. On the Area of Overlap of Translated Polygons. (Also cross-referenced as CAR-TR-699) January 1994.
Given two simple polygons P and Q in the plane and a translation vector t E R2, the area-oJ-overlap function of P and Q is the function Ar(t) = Area(P n (t + Q)), where t + Q denotes Q translated by t. This function has a number of applications in areas such as motion planning and object recognition. We present a number of mathematical results regarding this function. We also provide efficient algorithms for computing a representation of this function, and for tracing contour curves of constant area of o verlap. Department of Computer Science, University of Maryland, Center for Automation Research,
(Also cross-referenced as CAR-TR-698) Image Analysis and Computer Vision: 1993. Azriel Rosenfeld. Center for Automation Research, Department of Computer Science, University of Maryland, January 1994.
This paper presents a bibliography of nearly 1300 references related to computer vision and image analysis, arranged by subject matter. The topics covered include computational techniques; feature detection and segmentation; image analysis; twodimensional shape; pattern; color and texture; matching and stereo; three-dimensional recovery and analysis; three-dimensional shape; and motion. A few references are also given on related topics, such as geometry, graphics, coding and processing, sensors and optical processing, visual perception, neural nets, pattern recognition, and artificial intelligence, as well as on applications. The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
Alok Aggarwal. Amotz Bar-Noy. Samir Khuller. Dina Kravets. Baruch Schieber. December 1993.
Efficient Minimum Cost Matching and Transportation Using. We present efficient algorithms for finding a minimum cost perfect matching, and for solving the transportation problem in bipartite graphs, G=(\Red\cup \Blue, \Red\times \Blue), where |\Red|=n, |\Blue|=m, n\le m, and the cost function obeys the quadrangle inequality. First, we assume that all the \red\ points and all the \blue\ points lie on a curve that is homeomorphic to either a line or a circle and the cost function is given by the Euclidean distance along the curve. We present a linear time algorithm for the matching problem that is simpler than the algorithm of \cite{kl75}. We generalize our method to solve the corresponding transportation problem in O((m+n) \log (m+n)) time, improving on the best previously known algorithm of \cite{kl75}. Next, we present an O(n\log m)-time algorithm for minimum cost matching when the cost array is a bitonic Monge array. An example of this is when the \red\ points lie on one straight line and the \blue\ points lie on another straight line Finally, we provide a weakly polynomial algorithm for the transportation problem in which the associated cost array is a bitonic Monge array. Our algorithm for this problem runs in O(m \log(\sum_{j=1}^m \sj_j)) time, where \di_i is the demand at the ith sink, \sj_j is the supply available at the jth source, and \sum_{i=1}^n \di_i \le \sum_{j=1}^m \sj_j. (Also cross-referenced as UMIACS-TR-93-140) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Beyond Uniformity and Independence:. February 1994.
Christos Faloutsos. Ibrahim Kamel. We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA's Infrared-Ultraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, non-integer fractal dimension. Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the R-trees. We provide the {\em first} analysis of R-trees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5\%, and it rarely exceeds 10\%. We believe that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for {\em any} spatial access method, as well as better estimates for query optimization on multi-attribute queries. NOTE - Appeared in PODS 1994. Christos Faloutsos and Ibrahim Kamel. "Beyond Uniformity and Independence: Analysis of R-Trees Using the Concept of Fractal Dimension", Proc. ACM SIGACT-SIGMOD-SIGART PODS. Minneapolis, MN (May 1994), pp. 4-13. (Also cross-referenced as UMIACS-TR-93-130) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chungmin Melvin Chen. Nick Roussopoulos. December 1993.
Adaptive Selectivity Estimation Using Query Feedback. We propose a novel approach for estimating the record selectivities of database queries. The real attribute value distribution is adaptively approximated by a curve-fitting function using a query feedback mechanism. This approach has the advantages of requiring no extra database access overhead for gathering statistics and of being able to continuously adapt the value distribution through queries and updates. Experimental results show that the estimation accuracy of this approach is comparable to traditional methods based on statistics gathering. (Also cross-referenced as UMIACS-TR-93-138) Dept. of Computer Science, Univ. of Maryland,
William Pugh. David Wonnacott. An Exact Method for Analysis of Value-based Array Data Dependences. December 1993.
Standard array data dependence testing algorithms give information about the aliasing of array references. If statement 1 writes a[5], and statement 2 later reads a[5], standard techniques described this as a flow dependence, even if there was an intervening write. We call a dependence between two references to the same memory location a memory-based dependence. In contrast, if there are no intervening writes, the references touch the same value and we call the dependence a value-based dependence. There has been a surge of recent work on value-based array data dependence analysis (also referred to as computation of array data-flow dependence information). In this paper, we describe a technique that is exact over programs without control flow (other than loops) and non-linear references. We compare our proposal with the technique proposed by Paul Feautrier, which is the other technique that is complete over the same domain as ours. We also compare our work with that of Tu and Padua, a representative approximate scheme for array privatization. (Also cross-referenced as UMIACS-TR-93-137) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Wayne Kelly. William Pugh. A Framework for Unifying Reordering Transformations. April 1993.
We present a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, skewing, tiling, index set splitting and statement reordering. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. The framework is designed to provide a uniform way to represent and reason about transformations. As part of the framework, we provide algorithms to assist in the building and use of schedules. In particular, we provide algorithms to test the legality of schedules, to align schedules and to generate optimized code for schedules. (Also cross-referenced as UMIACS-TR-93-134) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
William Pugh. Definitions of Dependence Distance. April 1993.
Data dependence distance is widely used to characterize data dependences in advanced optimizing compilers. The standard definition of dependence distance assumes that loops are normalized (have constant lower bounds and a step of 1); there is not a commonly accepted definition for unnormalized loops. We have identified several potential definitions, all of which give the same answer for normalized loops. There are a number of subtleties involved in choosing between these definitions, and no one definition is suitable for all applications. (Also cross-referenced as UMIACS-TR-93-133) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
William Pugh. David Wonnacott. Eliminating False Data Dependences using the Omega Test. December 1992.
Array data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we really should be asking go beyond integer programming and require decision procedures for a subclass of Presburger formulas. In this paper, we describe how to extend the Omega test so that it can answer these queries and allow us to eliminate these false data dependences. We have implemented the techniques described here and believe they are suitable for use in production compilers. (An earlier version of this paper appeared at the ACM SIGPLAN PLDI'92 conference). (Also cross-referenced as UMIACS-TR-93-132) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Christos Faloutsos. M. Ranganathan. Yannis Manolopoulos. December 1993.
Fast Subsequence Matching in Time-Series Databases. We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree \cite{Beckmann90R}. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times. Appeared in ACM SIGMOD 1994, pp 419-429. Given "Best Paper award" (Also cross-referenced as UMIACS-TR-93-131) Dept. of Computer Science, Univ. of Maryland,
Ying Wang. Rama Chellappa. Qinfen Zheng. CFAR Detection of Targets in Fully Polarimetric SAR Images. (Also cross-referenced as CAR-TR-696) November 1993.
Traditional constant false alarm rate (CFAR) detection algorithms produce a lot of false targets when applied to single-look, high-resolution, fully polarimetric synthetic aperture radar (SAR) images, due to the presence of speckle. We propose a two stag e CFAR detector followed by conditional dilation for detecting point and extended targets in polarimetric SAR images. In the first stage, possible targets are detected and false targets due to the speckle are removed by using global statistical parameters . In the second stage, the local statistical parameters are used to detect targets in regions adjacent to targets detected in the first stage. Conditional dilation is then performed to recover target pixels lost in second stage CFAR detection. The performance of a CFAR detector will be degraded if an incorrect statistical model is adopted and the data are correlated. A goodness-of-fit test is performed to decide the appropriate distribution and the effects of decorrelation of the data are cons idered. Good experimental results are obtained when our method is applied to single-look, highresolution, fully polarimetric SAR images acquired from MIT Lincoln Laboratory. Department of Computer Science, University of Maryland, Center for Automation Research,
Saad Ahmed Sirohey. Human Face Segmentation and Identification. (Also cross-referenced as CAR-TR-695) November 1993.
Computer Vision Laboratory, This thesis considers segmentation and identification of human faces from grey scale images with clutter. The segmentation developed utilizes the elliptical structure of the human head. It uses the information present in the edge map of the image and thr ough some preprocessing separates the head from the background clutter. An ellipse is then fitted to mark the boundary between the head region and the background. The identification procedure finds feature points in the segmented face through a Gabor wave let decomposition and performs graph matching. The segmentation and identification algorithms were tested on a database of 48 images of 16 persons with encouraging results. Department of Computer Science, University of Maryland, Center for Automation Research,
Azriel Rosenfeld. Fuzzy Plane Geometry: Triangles. (Also cross-referenced as CAR-TR-694) November 1993.
A fuzzy triangle T (with a discrete-valued membership function) can be regarded as a nest of parallel-sided triangles Ti with successively higher membership values. Such a nest is determined by its max projections on any two of its "sides". The area (per imeter) of T is a weighted sum of the areas (perimeters) of the Ti's. The side lengths and altitudes of T can also be defined as weighted sums obtained from projections; using these definitions, the perimeter of T is the sum of the side lengths, and the s ide lengths are related to the vertex angles by the Law of Sines, but there is no simple relationship between the area of T and the products of the side lengths and altitudes. Department of Computer Science, University of Maryland, Center for Automation Research,
Sandor Z. Der. Rama Chellappa. Probe Based Recognition of Targets in Infrared Images. (Also cross-referenced as CAR-TR-693) November 1993.
A probe based approach is used to recognize objects in a cluttered background using an infrared imager. A probe is a simple mathematical function which operates locally on image grey levels and produces an output that is more directly usable by an algori thm. A directional probe image is calculated by taking the difference in grey levels between pixels a set distance apart in a given direction, centered on the probe image pixel. These probe images contain the information necessary for use by an object rec ognition algorithm in a readily usable, and mathematically describable, form. A parametric statistical image background model which describes the probe images is introduced. The parameters of the probe image model can be readily estimated from the image. Knowledge of these parameters, together with target signatures obtained from Computer Aided Design (CAD) models, allows the likelihood ratio for a given object pose hypothesis versus the background null hypothesis to be written. The generalized likelihood ratio test is used to accept one of the object poses or to choose the null hypothesis. Results of the method applied to a large set of terrain model board images are presented. Department of Computer Science, University of Maryland, Center for Automation Research,
(Also cross-referenced as CAR-TR-692) November 1993.
Estimation of Vehicle Dynamics from Monocular Noisy Images. Yi-Sheng Yao. Rama Chellappa. Center for Automation Research, University of Maryland Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, This paper presents a new model-based egol lotion estimation algorithm for an autonomous vehicle navigating through rough terrain. Due to the uneven terrain, the vehicle undergoes bouncing, pitch and roll motion. To reliably accomplish other tasks such a s tracking and obstacle avoidance using visual inputs, it is essential to consider these disturbances. In this paper, two vehicle models available in the literature are used for egomotion estimation. The Half Vehicle Model (HVM) takes into account the bou ncing and pitch motion of the vehicle, and the Full Vehicle Model (FVM) also considers the roll motion. The dynamics of the vehicle are formulated using standard equations of motion. Assuming that depth information is known for some landmarks in the scene (e.g., obtained from a laser range finder), a feature-based approach is proposed to estimate vehicle motion parameters such as the vertical movement of the center of mass and the instantaneous angular velocity. An Iterated Extended Kalman Filter (IEKF) is used for recursive parameter estimation. Simulation results for both known and unknown terrain are presented.
(Also cross-referenced as CAR-TR-691) October 1993.
Evaluation of Pattern Classifiers for Fingerprint and OCR Applications. J.L. Blue. G.T. Candela. P.J. Grother. Rama Chellappa. C.L. Wilson. Computer Vision Laboratory, Center for Automation Research, Department of Computer Science, University of Maryland, In this paper we evaluate the classification accuracy of four statistical and three neural network classifiers for two image based pattern classification problems. These are fingerprint classification and optical character recognition (OCR) for isolated handprinted digits. The evaluation results reported here should be useful for designers of practical systems for these two important commercial applications. For the OCR problem, the Karhunen-Loeve (K-L) transform of the images is used to generate the inp ut feature set. Similarly for the fingerprint problem, the K-L transform of the ridge directions is used to generate the input feature set. The statistical classifiers used were Euclidean minimum distance, quadratic minimum distance, normal, and knearest neighbor. The neural network classifiers used were multilayer perceptron, radial basis function, and probabilistic. The OCR data consisted of 7,480 digit images for training and 23,140 digit images for testing. The fingerprint data consisted of 9,000 trai ning and 2,000 testing images. In addition to evaluation for accuracy, the multilayer perceptron and radial basis function networks were evaluated for size and generalization capability. For the evaluated datasets the best accuracy obtained for either pro blem was provided by the probabilistic neural network, where the minimum classification error was 2.5% for OCR and 7.2% for fingerprints.
Chungmin Melvin Chen. Nick Roussopoulos. October 1993.
The Implementation and Performance Evaluation of the ADMS Query Optimizer:. In this paper, we describe the design and evaluation of the ADMS optimizer. Capitalizing on a structure called Logical Access Path Schema to model the derivation relationship among cached query results, the optimizer is able to perform query matching coincidently with the optimization and generate more efficient query plans using cached results. The optimizer also features data caching and pointer caching, different cache replacement strategies, and different cache update strategies. An extensive set of experiments were conducted, and the results showed that pointer caching and dynamic cache update strategies substantially speedup query computations and, thus, increase query throughput under situations with fair query correlation and update load. The requirement of the cache space is relatively small and the extra computation overhead introduced by the caching and matching mechanism is more than offset by the time saved in query processing. (Also cross-referenced as UMIACS-TR-93-106) Dept. of Computer Science, Univ. of Maryland,
Satyandra K. Gupta. Dana S. Nau. A Systematic Approach for Analyzing the Manufacturability of Machined. October 1993.
The ability to quickly introduce new quality products is a decisive factor in capturing market share. Because of pressing demands to reduce lead time, analyzing the manufacturability of the proposed design has become an important step in the design stage. This paper presents an approach for analyzing the manufacturability of machined parts. Evaluating the manufacturability of a proposed design involves determining whether. or not it is manufacturable with a given set of manufacturing operationsÑand if so, then finding the associated manufacturing efficiency. Since there can be several different ways to manufacture a proposed design, this requires us to consider different ways to manufacture it, in order to determine which one best meets the design and manufacturing objectives. The first step in our approach is to identify all machining operations which can potentially be used to create the given design. Using these operations, we generate different operation plans for machining the part. Each time we generate a new operation plan, we examine whether it can produce the desired shape and tolerances, and calculate its manufacturability rating. If no operation plan can be found that is capable of producing the design, then the given design is considered unmachinable; otherwise, the manufacturability rating for the design is the rating of the best operation plan. We anticipate that by providing feedback about possible problems with the design, this work will help in speeding up the evaluation of new product designs in order to decide how or whether to manufacture them. Such a capability will be useful in responding quickly to changing demands and opportunities in the marketplace. (Also cross-referenced as UMIACS-TR-93-105) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Circa: The Cooperatice Intelligent Real-Time Control Architecture. David John Musliner. October 1993.
The Cooperative Intelligent Real-time Control Architecture (CIRCA) is a novel architecture for intelligent real-time control that can guarantee to meet hard deadlines while still using unpredictable, unrestricted AI methods. CIRCA includes a real-time subsystem used to execute reactive control plans that are guaranteed to meet the domain's real-time deadlines, keeping the system safe. At the same time, CIRCA's AI subsystem performs higher-level reasoning about the domain and the system's goals and capabilities, to develop future reactive control plans. CIRCA thus aims to be intelligent about real-time: rather than requiring the system's AI methods to meet deadlines, CIRCA isolates its reasoning about which time-critical reactions to guarantee from the actual execution of the se ected reactions. The formal basis for CIRCA's performance guarantees is a state-based world model of agent/environment interactions. Borrowing approaches from real-time systems research, the world model provides the information required to make real-time performance guarantees, but avoids unnecessary complexity. Using the world model, the AI subsystem develops reactive control plans that restrict the world to a limited set of safe and desirable states, by guaranteeing the timely performance of actions to preempt transitions that lead out of the set of states. By executing such "safe" and "stable" plans, CIRCA's real-time subsystem is able to keep the system safe (in the world as modeled) for an indeterminate amount of time, while the parallel AI subsystem develops the next appropriate plan. We have applied a prototype CIRCA implementation to a simulated Puma robot arm performing multiple tasks with real-time deadlines, such as packing parts off a conveyor belt and responding to asynchronous interrupts. Our experimental results show how the system can guarantee to accomplish these tasks under a given set of domain conditions (e.g., conveyor belt speed) and resource limitations (e.g., robot arm speed). Furthermore, because CIRCA reasons explicitly about its own predictable, guaranteed behaviors, the system can recognize when its resources are insufficient for the desired behaviors (e.g., parts are arriving too quickly to be packed carefully), and can then make principled modifications to its performance (e.g., temporarily stacking parts on a table) to maintain system safety. (Also cross-referenced as UMIACS-TR-93-104) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Reconfiguration of Hierarchical Tuple-Spaces: Experiments with. Gilberto Matos. James M. Purtilo. October 1993.
Department of Computer Science, Univ. of Maryland, University of Maryland Institute for Advanced Computer Studies, A hierarchical tuple-space model is proposed for dealing with issues of complexity faced by programmers who build and manage programs in distributed networks. We present our research on a Linda-style approach to both configuration and reconfiguration. After presenting the model used in our work, we describe an experimental implementation of a programming system based upon the model. (Also cross-referenced as UMIACS-TR-93-100)
Cengiz Alaettinoglu. A. Udaya Shankar. March 1995.
The Viewserver Hierarchy for Inter-Domain Routing:Protocols and Evaluation. We present an inter-domain routing protocol based on a new hierarchy, referred to as the viewserver hierarchy. The protocol satisfies policy and ToS constraints, adapts to dynamic topology changes including failures that partition domains, and scales well to large number of domains without losing detail (unlike the usual scaling technique of aggregating domains into superdomains). Domain-level views are maintained by special nodes called viewservers. Each viewserver maintains a view of a surrounding precinct. Viewservers are organized hierarchically. To obtain domain-level source routes, the views of one or more viewservers are merged (upto a maximum of twice the levels in the hierarchy). We also present a model for evaluating inter-domain routing protocols, and apply this model to compare our viewserver hierarchy against the simple approach where each node maintains a domain-level view of the entire internetwork. Our results indicate that the viewserver hierarchy finds many short valid paths and reduces the amount of memory requirement by two orders of magnitude. (Also cross-referenced as UMIACS-TR-93-98.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David Doermann. Ehud Rivlin. Isaac Weiss. October 1993.
Logo Recognition. The problem of logo recognition is of great interest in the document domain, especially for document databases. By recognizing the logo we obtain semantic information about the document which may be useful in deciding whether or not to analyze the textual components. Given a logo-like region from a document image and a {\em logo} database, we would like to determine if the region corresponds to a logo in the database. Similarly, if we are given a logo-like region and a {\em document} database, we wish to determine if there are any documents in the database of similar origin. Both problems require indexing into a possibly large model space. In this paper, we present a multi-level approach to logo recognition which uses text and contour features to prune the database and similarity invariants to obtain a more refined match. We outline our methods for page segmentation, feature extraction and indexing and demonstrate our approach on a database of approximately sixty logos. Dept. of Computer Science, Univ. of Maryland, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
David Carr. September 1993.
Specification of Interface Interaction Objects. User Interface Management Systems have signigicantly reduced the effort required to build a user interface. However, current systems assume a set of standard "widgets" and make no provisions for defining new ones. This forces the user interface designers to either do without or laboriously build new widgets with code. The Interface Objects Graph is presented as a method for specifying and communicating the design of innteraction objects or widgets. Two sample specifications are presented, one for a secure switch and the other for a two dimensional graphical browser. (Also cross-referenced as CAR-TR-687) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Samir Khuller. Balaji Raghavachari. Neal Young. September 1993.
Maintaining Directed Reachability with Few Edges. The MEG (minimum equivalent graph) problem is the following: ``Given a directed graph, find a smallest subset of the edges that maintains all reachability relations between nodes.'' This problem is NP-hard; this paper gives an approximation algorithm achieving a performance guarantee of about 1.64 in polynomial time. The algorithm achieves a performance guarantee of 1.75 in the time required for transitive closure. The heart of the MEG problem is the minimum SCSS (strongly connected spanning subgraph) problem --- the MEG problem restricted to strongly connected digraphs. For the minimum SCSS problem, the paper gives a practical, nearly linear-time implementation achieving a performance guarantee of 1.75. The algorithm and its analysis are based on the simple idea of contracting long cycles. The analysis applies directly to 2-Exchange, a general ``local improvement'' algorithm, showing that its performance guarantee is 1.75. (Also cross-referenced as UMIACS-TR-93-87) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Vinit Jain. Ben Shneiderman. September 1993.
Data Structures for Dynamic Queries: An Analytical and Experimantal Evaluation.. Dynamic Queries is a querying technique for doing range search on multi-key data sets. It is a direct manipulation mechanism where the query is formulated using graphical widgets and the results are displayed graphically preferably within 100 milliseconds. This paper evaluates four data structures, the multilist, the grid file, k-d tree and the quad tree used to organize data in high speed storage for dynamic queries. The effect of factors like size, distribution and dimensionality of data on the storage overhead and the speed of search is explored. Analytical models for estimating the storage and search overheads are presented, and verified to be correct by empirical data. Results indicate that multilists are suitable for small (few thousand points) data sets irrespective of the data distribution. For large data sets the grid files are excellent for uniformly distributed data, and trees are good for skewed data distributions. There was no significant difference in performance between the tree structures. (Also cross-referenced as CAR-TR-685) (Also cross-referenced as ISR-TR-93-73) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Christopher Ahlberg. Ben Shneiderman. September 1993.
The Alphaslider: A Compact and Rapid Selector. Research has suggested that rapid, serial, visual presentation of text (RSVP) may be an effective way to scan and search through lists of text strings in search of words, names, etc. The Alphaslider widget employs RSVP as a method for rapidly scanning and searching lists or menus in a graphical user interface environment. The Alphaslider only uses an area less than 7 x 2.5 cm2. The tiny size of the Alphaslider allows it to be placed on a credit card, on a control panel for a VCR, or as a widget in a direct manipulation based database interface. An experiment was conducted with four Alphaslider designs which showed that novice Alphaslider users could locate one item in a list of 10,000 film titles in 24 seconds on average, an expert user in about 13 seconds. (Also cross-referenced as CAR-TR-684) (Also cross-referenced as ISR-TR-93-72) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Christopher Ahlberg. Ben Shneiderman. September 1993.
Visual Information Seeking: Tight Coupling of Dynamic Query Filters. This paper offers new principles for visual information seeking (VIS). A key concept is to support browsing, which is distinguished from familiar query composition and information retrieval because of its emphasis on rapid filtering to reduce result sets, progressive refinement of search parameters, continuous reformulation of goals, and visual scanning to identify results. VIS principles developed include: dynamic query filters (query parameters are rapidly adjusted with sliders, buttons, maps, etc.), starfield displays (two- dimensional scatterplots to structure result sets and zooming to reduce clutter), and tight coupling (interrelating query components to preserve display invariants and support progressive refinement combined with an emphasis on using search output to foster search input). A FilmFinder prototype using a movie database demonstrates these principles in a VIS environment. (Also cross-referenced as CAR-TR-638) (Also cross-referenced as ISR-TR-93-71) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
(Also cross-referenced as CAR-TR-682) September 1993.
Delineating Buildings by Grouping Lines. Santhana Krishnamachari. Rama Chellappa. Computer Vision Laboratory, Center for Automation Research, Department of Computer Science, University of Maryland, An energy function based approach is presented to detect rectangular shapes in images. The proposed edge-based approach involves extracting straight lines from an edge map of the image. Then a Markov Random Field (MRF) is built on these lines, i.e., a suitable neighborhood and an energy function are specified based on the relative orientation and spatial location of the lines. This energy function can be construed as a measure of the conditional probability of observing the lines given the rectangular shapes (the positions and number of which are unknown) in the image. Minimizing the energy function is equivalent to selecting the maGcimum likelihood estimate of the rectangular shapes in the image from the observed lines. Simulated examples are presented to demonstrate the robustness of the proposed method. This approach, supplemented with some qualitative information about shadows and gradients, is used to detect rectangular buildings in real aerial images. Due to poor quality of the real images, only partial shapes are extracted in some cases. A modif ed deformable contour (snakes) based approach is then presented for completion of the partial shapes. The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
Amalgamating Knowledge Bases, II - Distributed Mediators. Sibel Adali. V.S. Subrahmanian. August 1993.
Integrating knowledge from multiple sources is an important aspect of automated reasoning systems.In previous work, we presented a uniform declarative and operational framework, based on annotated logics, for amalgamating multiple knowledge bases and data structures (e.g. relational, object-oriented, spatial, and temporal structures) when these knowledge bases (possibly) contain inconsistencies, uncertainties, and non-monotonic modes of negation. We showed that annotated logics may be used, with some modifications, to mediate between different knowledge bases. The multiple knowledge bases are amalgamated by embedding the individual knowledge bases into a lattice. In this paper, we describe how, given a network of sites where the different databases reside, it is possible to define a distributed semantics for amalgamated knowledge bases. More importantly, we study how the mediator may be distributed across multiple sites so that when certain conditions are satisfied, network failures do not affect the end results of queries that a user may pose. We specify different ways of distributing the mediator to protect against different types of network link failures and develop alternative soundness and completeness results. (Also cross-referenced as UMIACS-TR-93-81) Dept. of Computer Science, Univ. of Maryland,
David A. Bader. Joseph Ja'Ja'. Rama Chellappa. August 1993.
Scalable Data Parallel Algorithms for Texture Synthesis and. Department of Electrical Engineering, and, This paper introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov Random Field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. Use of fine-grained, data parallel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented here enables machine independent scalable algorithms for a number of problems in image processing and analysis. (Also cross-referenced as UMIACS-TR-93-80.)
A Framework for Dynamic Reconfiguration of Distributed Programs. Christine R. Hofmeister. James M. Purtilo. August 1993.
University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Current techniques for a software engineer to change a computer program are limited to static activitiesÑ once the application begins executing, there are few reliable ways to reconfigure it. We have developed a general framework for reconfigurating application software dynamically. A sound method for managing changes in a running program allows developers to perform maintenance activities without loss of the overall system's service. The same methods also support some forms of load balancing in a distributed system, and research in software fault tolerance. Our goal has been to create an environment for organizing and effecting software reconfiguration activities dynamically. First we present the overall framework within which reconfiguration is possible, then we describe our formal approach for programmers to capture the state of a process abstractly. Next, we describe our implementation of this method within an environment for experimenting with program reconfiguration. We conclude with a summary of the key research problems that we are continuing to pursue in this area. (Also cross-referenced as UMIACS-TR-93-78)
G.Z. Sun. C. Lee Giles. H.H. Chen. Y.C. Lee. August 1993.
The Neural Network Pushdown Automaton: Model, Stack. NEC Resesearch Institute, Princeton, NJ and, In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one obvious approach to enhancing the processing power of a recurrent neural network is to couple it with an external stack memory - in effect creating a neural network pushdown automata (NNPDA). This paper discusses in detail this NNPDA - its construction, how it can be trained and how useful symbolic information can be extracted from the trained network. In order to couple the external stack to the neural network, an optimization method is developed which uses an error function that connects the learning of the state automaton of the neural network to the learning of the operation of the external stack. To minimize the error function using gradient descent learning, an analog stack is designed such that the action and storage of information in the stack are continuous. One interpretation of a continuous stack is the probabilistic storage of and action on data. After training on sample strings of an unknown source grammar, a quantization procedure extracts from the analog stack and neural network a discrete pushdown automata (PDA). Simulations show that in learning deterministic context-free grammars - the balanced parenthesis language, 1n0n, and the deterministic Palindrome - the extracted PDA is correct in the sense that it can correctly recognize unseen strings of arbitrary length. In addition, the extracted PDAs can be shown to be identical or equivalent to the PDAs of the source grammars which were used to generate the training strings. (Also cross-referenced as UMIACS-TR-93-77.)
Ken Salem. Space-Efficient Hot Spot Estimation. August 1993.
This paper is concerned with the problem of identifying names which occur frequently in an ordered list of names. Such names are called hot spots. Hot spots can be identified easily by counting the occurrences of each name and then selecting those with large counts. However, this simple solution requires space proportional to the number of names that occur in the list. In this paper, we present and evaluate two hot spot estimation techniques. These techniques guess the frequently occurring names, while using less space than the simple solution. We have implemented and tested both techniques using several types of input traces. Our experiments show that very accurate guesses can be made using much less space than the simple solution would require. (Also cross-referenced as UMIACS-TR-93-74) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Samir Khuller. August 1993.
Design and Analysis of Algorithms: Course Notes. These are my lecture notes from CMSC 651: Design and Analysis of Algorithms}, a one semester course that I taught at University of Maryland in the Spring of 1993. The course covers core material in algorithm design, and also helps students prepare for research in the field of algorithms. The reader will find an unusual emphasis on graph theoretic algorithms, and for that I am to blame. The choice of topics was mine, and is biased by my personal taste. The material for the first few weeks was taken primarily from the (now not so new) textbook on Algorithms by Cormen, Leiserson and Rivest. A few papers were also covered, that I personally feel give some very important and useful techniques that should be in the toolbox of every algorithms researcher. (Also cross-referenced as UMIACS-TR-93-72) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
A Parallel Inexact Newton Method Using a Krylov Multisplitting Algorithm. Chiou-Ming Huang. Dianne P. O'Leary. July 1993.
Abstract. We present a paraUel variant of the inexact Newton algorithm that uses the Krylov multisplitting algorithm (KMS) to compute the approxrmate Newton direction. The algorithm can be used for solving unconstrained optimization problems or systems of nonlinear equations. The KMS algorithm is a more efficient paraUel implementation of Krylov subspace methods (GMRES, Arnoldi, etc.) with multisplitting preconditioners. The work of the KMS algorithm is divided into the multisplitting tasks and a direction forrning task. There is a great deal of paraUelism within each task and the number of synchronization points between the tasks is greatly reduced. We study the local and global convergence properties of the algorithm and present results of numerical examples on a sequential computer. (Also cross-referenced as UMIACS-TR-93-71) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Vadim Maslov. Lazy Array Data-Flow Dependence Analysis. July 1993.
Automatic parallelization of real FORTRAN programs does not live up to users expectations yet, and dependence analysis algorithms which either produce too many false dependences or are too slow contribute significantly to this. In this paper we introduce data-flow dependence analysis algorithm which exactly computes value-based dependence relations for program fragments in which all subscripts, loop bounds and IF conditions are affine. Our algorithm also computes good affine approximations of dependence relations for non-affine program fragments. Actually, we do not know about any other algorithm which can compute better approximations. And our algorithm is efficient too, because it is lazy. When searching for write statements that supply values used by a given read statement, it starts with statements which are lexicographically close to the read statement in iteration space. Then if some of the read statement instances are not ``satisfied'' with these close writes, the algorithm broadens its search scope by looking into more distant writes. The search scope keeps broadening until all read instances are satisfied or no write candidates are left. We timed our algorithm on several benchmark programs and the timing results suggest that our algorithm is fast enough to be used in commercial compilers --- it usually takes 5 to 15 percent of f77 -O2 compilation time to analyze a program. Most programs in the 100-line range take less than 1 second to analyze on a SUN SparcStation IPX. (Also cross-referenced as UMIACS-TR-93-69) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Vadim Maslov. William Pugh. Simplifying Polynomial Constraints Over Integers to Make Dependence. February 1994.
Why do existing parallelizing compilers and environments fail to parallelize many realistic FORTRAN programs? One of the reasons is that these programs contain a number of linearized array references, such as {\tt A(M*N*i+N*j+k)} or {\tt A(i*(i+1)/2+j)}. Performing exact dependence analysis for these references requires testing polynomial constraints for integer solutions. Most existing dependence analysis systems, however, restrict themselves to solving affine constraints only, so they have to make worst-case assumptions whenever they encounter a polynomial constraint. In this paper we introduce an algorithm which exactly and efficiently solves a class of polynomial constraints which arise in dependence testing. Another important application of our algorithm is to generate code for loop transformation known as symbolic blocking (tiling). (Also cross-referenced as UMIACS-TR-93-68.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Wayne Kelly. William Pugh. Determing Schedules based on Performance Estimation. Dept. of Computer Science, Univ. of Maryland, April 1993.
In previous work, we presented a framework for unifying iteration reordering transformations such as loop interchange, loop distribution, loop skewing and statement reordering. The framework provides a uniform way to represent and reason about transformations. However, it does not provide a way to decide which transformation(s) should be applied to a given program. This paper describes a way to make such decisions within the context of the framework. The framework is based on the idea that a transformation can be represented as a schedule that maps the original iteration space to a new iteration space. We show how we can estimate the performance of a program by considering only the schedule from which it was produced. We also show how to produce an upper bound on performance given only a partially specified schedule. Our ability to estimate performance directly from schedules and to do so even for partially specified schedules allows us to efficiently find schedules which will produce good code. (Also cross-referenced as UMIACS-TR-93-67) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
John R. Callahan. Software Packaging. June 1993.
Many computer programs cannot be easily integrated because their components are distributed and heterogeneous, i.e., they are implemented in diverse programming languages, use different data representation formats, or their runtime environments are incom patible. In many cases, programs are integrated by modifying their components or interposing mechanisms that handle communication and conversion tasks. For example, remote procedure call (RPC) helps integrate heterogeneous, distributed programs. When conf iguring such programs, however, mechanisms like RPC must be used explicitly by software developers in order to integrate collections of diverse components. Each collection may require a unique integration solution. This thesis describes a process called software packaging that automatically determines how to integrate a diverse collection of computer programs based on the types of components involved and the capabilities of available translators and adapters in an environment. Whereas previous efforts focused solely on integration mechanisms, software packaging provides a context that relates such mechanisms to software integration processes. We demonstrate the value of this approach by reducing the cost of configuring applications whose components are distributed and implemented in different programming languages. Our software packaging tool subsumes traditional integration tools like UNIX MAKE by providing a rule-based approach to software integration that is independent of execution environments. (Also cross-referenced as UMIACS-TR-93-56) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Kent L. Norman. Patricia Wright. August 1993.
HyperTools for HyperTexts: Supporting Readers of Electronic documents. The most important factor determining the usability of electronic documents (e.g. hypertexts) is neither the set of links within the material nor the structure of the database but the availability ÒhypertoolsÓ defined as a vast range of electronic tools to support a diversity of reading activities. To illustrate this point, an analysis is undertaken of reading done for the purpose of using the information within a document to assist in tasks involving planning, decision making, and problem solving. Secondly, many readers start with the goals of finding, comparing, and evaluating information. Tools can help them realize these goals by supporting the activities of searching, collecting, and manipulating information. Other tools help people explore task requirements, enable them to preplan details of their interaction with the text, enhance their use of other tools, and optimize their screen-based working environment. It is argued that the support available for people working with electronic texts will not only offer many of of the functions available to readers of printed text, but electronic tools will also offer functionality that has no close counterpart in printed media. Consequently, hypertools will change the way readers do familiar tasks and facilitate tasks which are exceedingly difficult to accomplish when working with information on paper. (Also cross-referenced as CAR-TR-675) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Chungmin Melvin Chen. Nick Roussopoulos. June 1993.
Adaptive Database Buffer Allocation Using Query Feedback. In this paper, we propose the concept of using query execution feedback for improving database buffer management. A query feedback model which adaptively quantifies the page fault characteristics of all query access patterns including sequential, looping and most importantly random, is defined. Based on this model, a load control and a marginal gain ratio buffer allocation scheme are developed. Simulation experiments show that the proposed method is consistently better than the previous methods and in most cases, it significantly outperforms all other methods for random access reference patterns. (Also cross-referenced as UMIACS-TR-93-49) Dept. of Computer Science, Univ. of Maryland,
A Server of Distributed Disk Pages Using a Configurable Software Bus. Charles Falkenberg. Paul Hagger. Steve Kelley. July 1993.
University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, University of Maryland, As network latency drops below disk latency, access time to a remote disk will begin to approach local disk access time. The performance of I/O may then be improved by spreading disk pages across several remote disk servers and accessing disk pages in parallel. To research this we have prototyped a data page server called a Page File. This persistent data type provides a set of methods to access disk pages stored on a cluster of remote machines acting as disk servers. The goal is to improve the throughput of database management system or other I/O intensive application by accessing pages from remote disks and incurring disk latency in parallel. This report describes the conceptual foundation and the methods of access for our prototype. (Also cross-referenced as UMIACS-TR-93-47)
Masakazu Osada. Holmes Liao. Ben Shneiderman. April 1993.
AlphaSlider: Searching Textual Lists with Sliders. AlphaSlider is a query interface that uses a direct manipulation slider to select words, phrases, or names from an existing list. This paper introduces a prototype of AlphaSlider, describes the design issues, reports on an experimental evaluation, and offers directions for further research. The experiment tested 24 subjects selecting items from lists of 40, 80, 160, and 320 entries. Mean selection times only doubled with the 8-fold increase in list length. Users quickly accommodated to this selection method. (Also cross-referenced as CAR-TR-637) (Also cross-referenced as ISR-TR-93-52) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Diane Lindwarm. Kent L. Norman. May 1993.
Student Evaluation of The Software in The AT&T Teaching Theater. The AT&T Teaching Theater is a highly interactive, multimedia electronic classroom at the University of Maryland offering instructors many new and creative teaching opportunities. Although this technology may hold many exciting possibilities, it is important to not lose sight of the main objective of any teaching facility - the students. Therefore, the important questions are: "How do students rate the AT&T Teaching Theater? What are their opinions of the various types of software programs currently offered? Do they facilitate or interfere with the learning process?" This paper discusses the results from a survey of students who attended classes in the AT&T Teaching Theater, Fall semester, 1992. A comparison among the different types of software used by the various instructors is the focus for this evaluation. In particular, HyperCourseware, a program providing an "electronic infrastructure" for computer based education will be at the center of this comparison. HyperCourseware is a "work in progress" and is one of the few software packages used in the electronic classroom designed with the Teaching Theater in mind. The findings from this paper will be used to determine where improvements need to be made in order to benefit the students and to make the most of the technology offered in the AT&T Teaching Theater in the future. (Also cross-referenced as CAR-TR-672) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
(Also cross-referenced as CAR-TR-668) May 1993.
Basic Visual Capabilities. Cornelia Fermuller. Computer Vision Laboratory, Center for Automation Research, Department of Computer Science, University of Maryland, A visual system in order to successfully navigate in its environment and understand the visible world must possess a set of basic capabilities. This thesis describes the design and the development of the processes responsible for the estimation of egomot ion (the svstem's motion) and object motion which are a prerequisite for the accomplishment of any other navigational task. For a monocular observer capable of actively controlling the geometric parameters of its sensory apparatus, it is shown how differe nt activities facilitate the interpretation of visual motion. The basic idea of the object motion estimation strategy lies in the employment of fixation and tracking. Fixation simplifies much of the computation by placing the object at the center of the v isual field, and the main advantage of tracking is the accumulation of information over time. For the problem of egomotion estimation new constraints of a global nature relating 2-D image measurements to 3-D motion parameters are presented. Local image me asurements form global patterns in the image plane and the position of these patterns determines the 3-D motion parameters. The algorithms developed are provably robust because the constraints employed are global and qualitative and neither correspondence nor optical flow is utilized as input, but only the spatio-temporal derivatives of the image intensity function. The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
David Doermann. April 1993.
Document Image Understanding: Integrating Recovery and Interpretation. Many document image understanding problems require a more comprehensive examination of document features than is typically deemed necessary for recognition tasks. We believe that these problems require a detailed analysis of stroke and sub-stroke features in the document image with the goal of obtaining information about the environment or process which created the document and establishing a context for understanding. We introduce the concept of {\em recovery} into the document domain. We provide a ``stroke platform'' representation which establishes a verifiable ``link to the pixels'' and demonstrate its usefulness for recovery tasks. This representation allows us to overcome many of the problems associated with the rapid, irreversible abstraction associated with traditional document processing methods and provides the basic framework for our analysis of handwritten documents. By obtaining a detailed description of the document and its properties, we are able to establish a context for analysis and validate assumptions about the domain. This dissertation presents our work on several document image understanding problems including: 1) demonstrating the successful use of the stroke platform for the problem of interpreting and reconstructing junctions and endpoints; 2) exploring the effects of the handwriting process on the document by the development of a model for instrument grasp and a study of its effects on pressure features, 3) posing and providing an approach to the problem of recovering temporal information from static images of handwriting, 4) addressing various sub-tasks of the problem of processing form documents, and 5) extending the detailed analysis philosophy to demonstrate its feasibility in related document domains. In this paper, we present a multi-level approach to logo recognition which uses text and contour features to prune the database and similarity invariants to obtain a more refined match. We outline our methods for page segmentation, feature extraction and indexing and demonstrate our approach on a database of approximately sixty logos. Dept. of Computer Science, Univ. of Maryland, The postscript version of this TR is available from the Center for Automation Research via anonymous ftp at ftp.cfar.umd.edu; or via the WWW at http://www.cfar.umd.edu/CfAR/TRs.
A. Udaya Shankar. David Lee. December 1994.
Minimum-Latency Transport Protocols with Modulo-N Incarnation Numbers. To provide reliable connection management, a transport protocol uses 3-way handshakes in which user incarnations are identified by bounded incarnation numbers from some modulo-$N$ space. Cacheing schemes have been proposed to reduce the 3-way handshake to a 2-way handshake, providing the minimum latency desired for transaction-oriented applications. In this paper, we define a class of cacheing protocols and determine the minimum $N$ and optimal cache residency time as a function of real-time constraints (e.g.\ message lifetime, incarnation creation rate, inactivity duration, etc.). The protocols use the client-server architecture and handle failures and recoveries. Both clients and servers generate incarnation numbers from a local counter (e.g.\ clock). These protocols assume a maximum duration for each incarnation; without this assumption, there is a very small probability ($\approx \frac{1}{N^2}$) of misinterpretation of incarnation numbers. This restriction can be overcome with some additional cacheing. (Also cross-referenced as UMIACS-TR-93-24.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Why Broyden's Nonsymmetric Method Terminates on linear equations. Dianne P. O'Leary. March 1993.
Abstract. The family of algorithms introduced by Broyden in 1965 for solving systems of nonlinear equations has been used quite effectively on a variety of problems. In 1979, Gay proved the then surprising result that the algorithms terminate in at most 2n steps on linear problems with n variables. His very clever proof gives no insight into properties of the intermediate iterates, however. In this work we show that Broyden's methods are projection methods, forcing the residuals to lie in a nested set of subspaces of decreasing dimension. (Also cross-referenced as UMIACS-TR-93-23) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Samir Khuller. Balaji Raghavachari. Neal Young. March 1993.
Designing Multi-Commodity Flow Trees. The traditional multi-commodity flow problem assumes a given flow network in which multiple commodities are to be maximally routed in response to given demands.This paper considers the multi-commodity flow network-design problem: given a set of multi-commodity flow demands, find a network subject to certain constraints such that the commodities can be maximally routed. This paper focuses on the case when the network is required to be a tree. The main result is an approximation algorithm for the case when the tree is required to be of constant degree. The algorithm reduces the problem to the minimum-weight balanced-separator problem; the performance guarantee of the algorithm is within a factor of 4 of the performance guarantee of the balanced-separator procedure. If Leighton and Rao's balanced-separator procedure is used, the performance guarantee is O(\log n). This improves the O(\log^2 n) approximation factor obtained by a direct application of the balanced-separator method. (Also cross-referenced as UMIACS-TR-93-20) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ibrahim Kamel. Christos Faloutsos. Hilbert R-tree: An improved Rtree using fractals. February 1993.
We propose a new R-tree structure that outperforms all the older ones. The heart of the idea is to pack rectangles into the Entree nodes according to a linear ordering as opposed to the more complex heuristics used in the R*Ñtree. This ordering has to be 'good', in the sense that it should cluster 'similar' data rectangles together, to minimize the area and perimeter of the resulting minimum bounding rectangles (MBRs). Among the orderings we tried, the '2bc' method, the one that uses the (2d) Filbert value of the center of the rectangles, gives the best results. For a static database, the proposed ordering achieves superior packing, outperforming older packing methods [25], and the best dynamic method (R*-trees [3]). The savings are as high as 36% on real data. Moreover, we introduce a dynamic variation, the Eilbc7t R-tree: Given the ordering, every node has a well-defined set of sibling nodes; thus, we can use splitting algorithms that are similar to the deferred splitting of the B*-trees. By adjusting the split policy, the Filbert R-tree can achieve as high utilization as desired. To the contrary, the R*-tree has no control over the utilization, typically achieving up to 70%. We designed the manipulation algorithms in detail, and we did a full implementation of the Filbert R-tree. Our experiments show that a '2-to-3' split policy achieves good results, consistently outperforming the best competitor (R*-trees), with up to 28% savings on real data. (Also cross-referenced as UMIACS-TR-93-12) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. On the Convergence of Multipoint Iterations. February 1993.
This note gives a new convergence proof for iterations based on multipoint formulas. It rests on the very general assumption that if the desired fixed point appears as an argument in the formula then the the formula returns the fixed point. (Also cross-referenced as UMIACS-TR-93-10) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Richard Potter. January 1993.
Triggers: Guiding automation with pixels to achieve data access. Triggers is a programming system that shows how simple pattern matching applied to the pixels on a computer screen can effectively access data that is otherwise hidden inside an application program and unavailable to other programming by demonstration systems. Triggers invokes operators in applications by simulating keyboard and mouse actions, and accesses data through the pixel representations on the computer screen. Triggers extends the record/playback style popularized by keyboard macros. Triggers shows that pixel-based device-level algorithms exist, are understandable, can be easily implemented, and can allow a programming system to process data in situations where it would otherwise be impossible. (Also cross-referenced as CAR-TR-658) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Ben Shneiderman. January 1994.
Dynamic Queries for Visual Information Seeking. The capacity to incrementally adjust a query (with sliders, buttons, selections from a set of discrete attribute values, etc.) coupled with a visual display of results that are rapidly updated, dramatically changes the information seeking process. Dynamic queries on the chemical table of elements, computer directories, and a real estate database were built and tested in three separate exploratory experiments. Preliminary results show highly significant performance improvements and user enthusiasm more commonly seen with video games. Widespread application seems possible but research issues abound in the areas of: (1) graphic visualization design, (2) database and display algorithms, and (3) user interface requirements. Challenges include methods for rapidly displaying and changing many points, colors, and areas; multi-dimensional pointing and exploring using 6 degree of freedom input/output devices; incorporation of sound and visual display techniques that increase user comprehension; and integration with existing database systems. (Also cross-referenced as CAR-TR-655) (Also cross-referenced as SRC-TR-93-3) Original paper (September 1993), revised (January 1994) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Sheng-Tzong Cheng. Ashok K. Agrawala. Optimal Replication of Series-Parallel Graphs for Computation-Intensive. October 1994.
We consider the replication problem of series-parallel (SP) task graphs where each task may run on more than one processor. The objective of the problem is to minimize the total cost of task execution and interprocessor communication. We call it, the minimum cost replication problem for SP graphs (MCRP-SP). In this paper, we adopt a new communication model where the purpose of replication is to reduce the total cost. The class of applications we consider is computation-intensive applications in which the execution cost of a task is greater than its communication cost. The complexity of MCRP-SP for such applications is proved to be NP-complete. We present a branch-and-bound method to find an optimal solution as well as an approximation approach for suboptimal solution. The numerical results show that such replication may lead to a lower cost than the optimal assignment problem (in which each task is assigned to only one processor) does. The proposed optimal solution has the complexity of O(n22nM), while the approximation solution has O(n4M2), where n is the number of processors in the system and M is the number of tasks in the graph. (Also cross-referenced as UMIACS-TR-93-4.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Christos Faloutsos. Ibrahim Kamel. Packed R-trees Using Fractals. December 1992.
We propose a new packing technique for R-trees for static databases. Given a collection of rectangles, we sort them and we build the Rtree bottom-up. There are several ways to sort the rectangles; the innovation of this work is the use of fractals, and specifically the hilbert curve, to achieve better ordering of the rectangles and eventually better packing. We proposed and implemented several variations and performed experiments on synthetic, as well as real data (a TIGER file from the U.S. Bureau of Census). The winning variation ('2D-c') was the one that sorts the rectangles according to the hilbert value of the center. This variation consistently outperforms the R*-trees [3], Guttman's R-trees [13], as well as the packing method of Roussopoulos and Leifker [24]. The performance gain of the our method seems to increase with the skeweness of the data distribution; specifically, on the (highly skewed) TIGER dataset, it achieves up to 38% improvement in response time over the best known R-tree variant and up to 58% over the older packing algorithm. (Also cross-referenced as UMIACS-TR-92-133) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Boon-Teck Kuah. Ben Shneiderman. November 1992.
Providing Advisory Notices for UNIX Command Users: Design,. UNIX Notices (UN) was developed to study the problems in providing advice to users of complex systems. The issues studied were: what, when, and how to present the advice. The first experiment with 24 subjects examined how different presentation styles affect the effectiveness of UNÕs advice. The three presentation styles studied were: notice appears in separate window; notice appears only on request; notice appears in userÕs window immediately. The results showed that the third style was significantly more effective than the first style. Furthermore, the results indicated that the most effective presentation method is also the most disruptive. The second experiment with 29 subjects studied how delay in the advice feedback affects the performance of UN. The treatments were: immediate feedback, feedback at end of session, and no feedback. Over a period of 6 weeks, the commands entered by the subjects were logged and studied. The results showed that immediate feedback caused subjects to repeat significantly fewer inefficient command sequences. However, immediate feedback and feedback at end of session may have given subjects a negative feeling towards UNIX. (Also cross-referenced as CAR-TR-651) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Andrew Sears. Ben Shneiderman. June 1993.
Split menus: Effectively using selection frequency to organize menus. When some items in a menu are selected more frequently than others, as is often the case, designers or individual users may be able to speed performance and improve satisfaction by placing several high-frequency items at the top of the menu. Design guidelines for split menus were developed and applied. Split menus were implemented and tested in two field studies and a controlled experiment. In the field study conditions performance times were reduced from 17 or 58% depending on the site and menus. In the controlled experiment split menus were significantly faster than alphabetic menus and yielded significantly higher subjective preferences. A possible resolution to the continuing debate among cognitive theorists about predicting menu selection times is offered. We conjecture and offer evidence that the logarithmic model applies to familiar (high-frequency) items and the linear model applies to unfamiliar (low-frequency) items. (Also cross-referenced as CAR-TR-649) ACM Transactions on Computer-Human Interaction, vol. 1, #1 (March 1994) 27-51 %I Human Computer Interaction Laboratory Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
David Turo. Walter-Alexander Jungmeister. November 1992.
Adapting Treemaps to Stock Portfolio Visualization. Treemap visualization techniques are extended and applied to stock market portfolios via a prototype application. Designed to facilitate financial decision-making, the prototype provides an overview of large amounts of hierarchical financial data and allows users to alter aspects of the visual display dynamically. Treemap concepts are illustrated via examples which address common portfolio management needs. (Also cross-referenced as CAR-TR-648) (Also cross-referenced as SRC-TR-92-120) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
E. Schroeder. G. W. Stewart. On Infinitely Many Algorithms for Solving Equations. November 1992.
Translated by G. W. Stewart This report contains a translation of ``Ueber unendlich viele Algorithmen zur Aufl\"osung der Gleichungen,'' a paper by E. Schr\"oder which appeared in {\it Mathematische Annalen\/} in 1870. (Also cross-referenced as UMIACS-TR-92-121) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chen Chen. Selective Multicast Communication in Distributed Systems. December 1992.
Most current techniques for communications between the software components of a distributed application are limited to one-to-one communication; there is little support for one-to-many or many-to-many communications. We have developed a framework for selective multicast, a mechanism supporting one-to-many and many-to-many communications, where components of an application can communicate with each other. After discussing the overall requirements for a selective multicast environment, we describe our approach to selective multicast. An environment to support selective multicast in distributed system is then described in detail. We demonstrate selective multicast mechanism by providing an application of connecting Unix tools using selective multicast. (Also cross-referenced as UMIACS-TR-92-116) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Chen Chen. Elizabeth L. White. James M. Purtilo. A Packager for Multicast Software in Distributed Systems. December 1992.
PTM is a packagmg tool for preparing ordinary software to execute in multicast-based environments. Using PTM, both individual programs and systems of programs can be tailored to use multicast communication, without manual intervention from the programmer , who is in turn free to reason about the distributed system's initial configuration as if ordinary RPC or message passing semantics are to be used. But with PTM, programmers also retain the flexibility afforded at run time by the multicast paradigm, wher e the set of tools that consume a given type of event can transparently evolve. After describing Polycast, our implementation of a multicast execution environment in terms of software bus organization, we present the packaging technology that automates p reparation of software for the environment. Software prototyping is one of the key beneficiaries of multicast communication, which led us to explore means for simplifying the programming tasks involved; and therefore we illustrate how Polyvast is serving our prototyping research by presenting an example prototyping tool called PTM, in which our multicast enables users to dynamically explore network design and configuration alternatives. (Also cross-referenced as UMIACS-TR-92-114) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant. David Carr. Hiroaki Hasegawa. October 1992.
When an Intermediate View Matters. The browsing of two dimensional images can be found in a large number of applications. When the image to be viewed is much larger than the screen available, a two dimensional browser has to be provided to allow users to access all parts of the image. We show the diversity of tasks and systems available and the need for 2D browser design guidelines. In the context of a microscope image browser, we investigate one common technique consisting of a global view of the whole image, coupled to a detailed, magnified view of part of the image. In particular we look at the benefits of providing an intermediate view when the detail-to-overview ratio is high. An experiment showed that users performance significantly degrades when no intermediate view is provided for a detail-to-overview ratio over 20:1. Our experience is also a good example of a real world application for which added features and added hardware need to be justified. (Also cross-referenced as CAR-TR-645) (Also cross-referenced as ISR-TR-92-119) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. On the Solution of Block Hessenberg Systems. October 1992.
This paper describes a divide-and-conquer strategy for solving block Hessenberg systems. For dense matrices the method is a little more efficient than Gaussian elimination; however, because it works almost entirely with the original blocks, it is be much more efficient for sparse matrices or matrices whose blocks can be generated on the fly. For Toeplitz matrices, the algorithm can be combined with the fast Fourier transform to give a new superfast algorithm. (Also cross-referenced as UMIACS-TR-92-109) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. Determining Rank in the Presence of Error. October 1992.
The problem of determining rank in the presence of error occurs in a number of applications. The usual approach is to compute a rank-revealing decomposition and make a decision about the rank by examining the small elements of the decomposition. In this paper we look at three commonly use decompositions: the singular value decomposition, the pivoted QR decomposition, and the URV decomposition. (Also cross-referenced as UMIACS-TR-92-108) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Lewis R. Karl. Michael Pettey. Ben Shneiderman. July 1992.
Speech Versus Mouse Commands for Word Processing:. Despite advances in speech technology, human factors research since the late 1970s has provided only weak evidence that automatic speech recognition devices are superior to conventional input devices such as keyboards and mice. However, recent studies indicate that there may be advantages to providing an additional input channel based on speech input to supplement the more common input modes. Recently the authors conducted an experiment to demonstrate the advantages of using speech-activated commands over mouse-activated commands for word processing applications when, in both cases, the keyboard is used for text entry and the mouse for direct manipulation. Sixteen experimental subjects, all professionals and all but one novice users of speech input, performed four simple word processing tasks using both input groups in this counterbalanced experiment. Performance times for all tasks were significantly faster when using speech to activate commands as opposed to using the mouse. On average, the reduction in task time due to using speech was 18.67%. The error rates due to subject mistakes were roughly the same for both input groups, and recognition errors, averaged over all the tasks, occurred for 6.25% of the speech-activated commands. Subjects made significantly more memorization errors when using speech as compared with the mouse for command activation. Overall, the subjects reacted positively to using speech input and preferred it over the mouse for command activation, however, they also voiced concerns about recognition accuracy, the interference of background noise, inadequate feedback and slow response time. The authors believe that the results of the experiment provide guidance for implementors and evidence for the utility of speech input for command activation in application programs. (Also cross-referenced as CAR-TR-630) (Also cross-referenced as SRC-TR-92-86) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Z. Bai. G. W. Stewart. SRRIT--A FORTRAN Subroutine to Calculate the Dominant Invariant. May 1992.
{\sl SRRIT} is a FORTRAN program to calculate an approximate orthonormal basis for a dominant invariant subspace of a real matrix $A$ by the method of simultaneous iteration \cite{stewart76a}. Specifically, given an integer $m$, {\sl SRRIT} attempts to compute a matrix $Q$ with $m$ orthonormal columns and real quasi-triangular matrix $T$ of order $m$ such that the equation \[ AQ = QT \] is satisfied up to a tolerance specified by the user. The eigenvalues of $T$ are approximations to the $m$ largest eigenvalues of $A$, and the columns of $Q$ span the invariant subspace corresponding to those eigenvalues. {\sl SRRIT} references $A$ only through a user provided subroutine to form the product $AQ$; hence it is suitable for large sparse problems. (Also cross-referenced as UMIACS-TR-92-61) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. Note on a Generalized Sylvester Equation. April 1992.
In this note we show how to compute the minimum-norm, least squares solution of the generalized Sylvester equation \[ AX + YB = C, \] (Also cross-referenced as UMIACS-TR-92-59) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Degi Young. Ben Shneiderman. May 1992.
A Graphical Filter/Flow Representation of Boolean. One of the powerful applications of Boolean expression is to allow users to extract relevant information from a database. Unfortunately, previous research has shown that users have difficulty specifying Boolean queries. In an attempt to overcome this limitation, a graphical Filter/Flow representation of Boolean queries was designed to provide users with an interface that visually conveys the meaning of the Boolean operators (AND, OR and NOT). This was accomplished by implementing a graphical interface prototype that uses the metaphor of water flowing through filters. Twenty subjects with no experience with Boolean logic participated in an experiment comparing the Boolean operations represented in the Filter/Flow interface with a text-only SQL interface. The subjects independently performed five comprehension tasks and five composition tasks in each of the interfaces. A significant difference (p < 0.05) in the total number of correct queries in each of the comprehension and composition tasks was found favoring Filter/Flow. (Also cross-referenced as CAR-TR-627) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
David Turo. Brian Johnson. May 1992.
Improving the Visualization of Hierarchies with Treemaps:. Controlled experiments with novice treemap users and real data highlight the strengths of treemaps and provide direction for improvement. Issues discussed include experimental results, layout algorithms, nesting offsets, labeling, animation and small multiple displays. Treemaps prove to be a potent tool for hierarchy display. The principles discussed are applicable to many information visualization situations. (Also cross-referenced as CAR-TR-626) (Also cross-referenced as ISR-TR-92-62) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. Updating URV Decompositions in Parallel. April 1992.
A URV decomposition of a matrix is a factorization of the matrix into the product of a unitary matrix (U), an upper triangular matrix (R), and another unitary matrix (V). In an earlier paper [UMIACS-TR-90-86] it was shown how to update a URV decomposition in such a way that it reveals the effective rank of the matrix. It was also argued that the updating procedure could be implemented in parallel on a linear array of processors; however, no specific algorithms were given. This paper gives a detailed implementation of the updating procedure. (Also cross-referenced as UMIACS-TR-92-44) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Alan Edelman. G. W. Stewart. Scaling for Orthogonality. April 1992.
In updating algorthms where orthogonal transformations are accumulated, it is important to preserve the orthogonality of the product in the presence of rounding error. Moonen, Van Dooren, and Vandewalle have pointed out that simply normalizing the columns of the product tends to preserve orthogonality\,---\,though not, as DeGroat points out, to working precision. In this note we give an analysis of the phenomenon. (Also cross-referenced as UMIACS-TR-92-43) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
C. G. Jacobi. G. W. Stewart. On a New Way of Solving the Linear Equations that Arise in the Method. May 1992.
Translated by G. W. Stewart This report contains a translation of a paper of C. G. J. Jacobi, ``Ueber eine neue Aufl\"osungsart der bei der Methode der kleinsten Quadrate vorkommenden line\"aren Gleichungen,'' which appeared in {\it Astronomische Nachrichten\/} {\bf 22} (1845). In the paper Jacobi shows how to use rotations to increase the diagonal dominance of symmetric linear systems, which he then solves by what we today call the point Jacobi method. This preconditioner is none other than Jacobi's method for diagonalizing a symmetric matrix. Although Jacobi points out his method can be used to find eigenvalues, he reserves a fuller exposition for a later paper [Journal f\"ur die reine und angewandte Mathematik, {\bf 30} (1846), 51--s94], which is now generally cited as the source of the method. A variant for unsymmetric equations is also considered. (Also cross-referenced as UMIACS-TR-92-42) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
David Carr. Hiroaki Hasegawa. Doug Lemmon. Catherine Plaisant. March 1992.
The Effects of Time Delays on a Telepathology User Interface. Telepathology enables a pathologist to examine physically distant tissue samples by microscope operation over a communication link. Communication links can impose time delays which cause difficulties in controlling the remote device. Such difficulties were found in a microscope teleoperation system. Since the user interface is critical to pathologist's acceptance of telepathology, we redesigned the user interface for this system, built two different versions (a keypad whose movement commands operated by specifying a start command followed by a stop command and a trackball interface whose movement commands were incremental and directly proportional to the rotation of the trackball). We then conducted a pilot study to determine the effect of time delays on the new user interfaces. In our experiment, the keypad was the faster interface when the time delay is short. There was no evidence to favor either the keypad or trackball when the time delay was longer. Moving long distances over the microscope slide by dragging the field-of-view indicator on the touchscreen control panel improved inexperienced user performance. Also, the experiment suggests that changes could be made to improve the trackball interface. (Also cross-referenced as CAR-TR-616) (Also cross-referenced as SRC-TR-92-49) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Cengiz Alaettinoglu. A. Udaya Shankar. August 1993.
Stepwise Assertional Design of Distance-Vector Routing Algorithms. There are many kinds of distance-vector algorithms for adaptive routing in wide-area computer networks, ranging from the classical Distributed Bellman-Ford to several recent algorithms that have better performance. However, these algorithms have very complicated behaviors and their analyses in the literature has been incomplete (and operational). In this paper, we present a stepwise assertional design of a recently proposed distance-vector algorithm. Our design starts with the Distributed Bellman-Ford and goes through two intermediate algorithms. The properties established for each algorithm hold for the succeeding algorithms. The algorithms analyzed here are representative of various internetwork routing protocols. (Also cross-referenced as UMIACS-TR-92-39.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Ehud Rivlin. Rodrigo Botafogo. Ben Shneiderman. March 1992.
Navigating in hyperspace: designing a structure based toolbox. Analyzing the structure of a hypertext database can give useful information to the traveler in hyperspace. We present a preliminary collection of structural tools for users of hypertext systems. These tools can suggest answers to questions like: Where am I ? How can I choose and get to my destination? What else is in my current neighborhood? etc. Structure is imposed on the hypertext by using two processes: hierarchization and cluster identification. Several metrics are presented and used in the above processes for locating landmarks and getting global information on the hypertext structure. The structural analysis is integrated with previous attempts to reduce the users' disorientation while navigating the hyperspace. An integration with fisheye views and tree-maps is presented. (Also cross-referenced as CAR-TR-606) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. On the Early History of the Singular Value Decomposition. March 1992.
This paper surveys the contributions of five mathematicians\,---\,Eugenio Beltrami (1835--1899), Camille Jordan (1838--1921), James Joseph Sylvester (1814--1897), Erhard Schmidt (1876--1959), and Hermann Weyl (1885--1955)\,---\,who were responsible for establishing the existence of the singular value decomposition and developing its theory. (Also cross-referenced as UMIACS-TR-92-31) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. On the Perturbation of LU, Cholesky, and QR Factorizations. February 1992.
To appear in SIMAX In this paper error bounds are derived for a first order expansion of the LU factorization of a perturbation of the identity. The results are applied to obtain perturbation expansions of the LU, Cholesky, and QR factorizations. (Also cross-referenced as UMIACS-TR-92-24) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. Gaussian Elimination, Perturbation Theory and Markov Chains. January 1992.
The purpose of this paper is to describe the special problems that emerge when Gaussian elimination is used to determinin the steady-state vector of a Markov chain. (Also cross-referenced as UMIACS-TR-92-23) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Holmes Liao. Masakazu Osada. Ben Shneiderman. February 1992.
Browsing Unix Directories With Dynamic Queries:. We designed, implemented, and evaluated an innovative concept for dynamic queries which involves the direct manipulation of small databases. Our domain was directories in a Unix file system. Dynamic queries allow users to formulate queries and explore the databases with graphical widgets, such as sliders and buttons, without requiring them to have any knowledge about the underlying structure of the database query languages, or command language syntax. Three interfaces for presenting directories were developed and tested with eighteen subjects in a within-subject design. The results of the formative evaluation yielded some useful guidelines for software designers. (Also cross-referenced as CAR-TR-605) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. On the Perturbation of Markov Chains with Nearly Transient States. January 1992.
To Appear in Numerische Mathematik Let $A$ be an irreducible stochastic matrix of the form \[ A = \bmx{cc} A_{11} & E_{12} \\ A_{21} & A_{22} \emx. \] If $E_{22}$ were zero, the states corresponding to $A_{22}$ would be transient in the sense that if the steady state vector $y\trp$ is partitioned conformally in the form $(y_1\trp \; y_2\trp)$ then $y_2\trp = 0$. If $E_{22}$ is small, then $y_2\trp$ will be small, and the states are said to be nearly transient. It this paper it is shown that small relative perturbations in $A_{11}$, $A_{21}$, and $A_{22}$, though potentially larger than $y_2\trp$, induce only small relative perturbations in $y_2\trp$. (Also cross-referenced as UMIACS-TR-92-14) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Andrew Sears. December 1992.
Layour Appropriateness: A metric for evaluating user interface widget. Numerous methods to evaluate user interfaces have been investigated. These methods vary greatly in the attention paid to the usersÕ tasks. Some methods require detailed task descriptions while others are task-independent. Unfortunately, collecting detailed task information can be difficult. On the other hand, task-independent methods cannot evaluate a design for the tasks users actually perform. The goal of this research is to develop a metric, which incorporates simple task descriptions, that can assist designers in organizing widgets in the user interface. Simple task descriptions provide some of the benefits, without the difficulties, of performing a detailed task analysis. The metric, Layout Appropriateness (LA), requires a description of the sequences of widget-level actions users perform and how frequently each sequence is used. This task description can either be from observations of an existing system or from a simplified task analysis. The appropriateness of a given layout is computed by weighting the cost of each sequence of actions by how frequently the sequence is performed. This emphasizes frequent methods of accomplishing tasks while incorporating less frequent methods in the design. Currently costs are based on the distance users must move the mouse. Other measures such as the number of eye fixations necessary to extract the relevant information or measure like the number of changes in direction may also prove useful, but must be validated before they are made available for use. In addition to providing an comparison of a proposed or existing layouts, an LA-optimal layout is presented to the designer. The designer can compare the LA-optimal and existing layouts or start with the LA-optimal layout and modify it to take additional factors into consideration. Software engineers who occasionally face interface design problems and user interface designers can benefit from the explicit focus on the usersÕ tasks that LA incorporates into automated user interface evaluation. (Also cross-referenced as CAR-TR-603) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Ibrahim Kamel. Christos Faloutsos. Parallel R-trees. January 6, 1992.
We consider the problem of exploiting parallelism to accelerate the performance of spatial access methods and specifically, R-trees [14]. Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large range queries, and (b) by engaging as few disks as possible on point queries [26]. We propose a simple hardware architecture consisting of one processor with several disks attached to it. On this architecture, we propose to distribute the nodes of a traditional R-tree, with crossdisk pointers ('Multiplexed' R-tree). The R-tree code is identical to the one for a single-disk R-tree, with the only addition that we have to decide which disk a newly created R-tree node should be stored in. We propose and examine several criteria to choose a disk for a new node. The most successful one, termed 'proximity index' or PI, estimates the similarity of the new node with the other Rtree nodes already on a disk, and chooses the disk with the lowest similarity. Experimental results show that our scheme consistently outperforms all the other heuristics for node-to-disk assignments, achieving up to 55% gains over the Round Robin one. Experiments also indicate that the multiplexed Rtree with PI heuristic gives better response time than the disk-stripping (="Super-node") approach, and imposes lighter load on the I/O sub-system. The speed up of our method is close to linear speed up, increasing with the size of the queries. (Also cross-referenced as UMIACS-TR-92-1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Christopher Williamson. Ben Shneiderman. January 1992.
The Dynamic HomeFinder: Evaluating Dynamic Queries in a Real-Estate. We designed, implemented, and evaluated a new concept for visualizing and searching databases utilizing direct manipulation called dynamic queries. Dynamic queries allow users to formulate queries by adjusting graphical widgets, such as sliders, and see the results immediately. By providing a graphical visualization of the database and search results, users can find trends and exceptions easily. User testing was done with eighteen undergraduate students who performed significantly faster using a dynamic queries interface compared to both a natural language system and paper printouts. The interfaces were used to explore a real-estate database and find homes meeting specific search criteria. (Also cross-referenced as CAR-TR-602) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
E. C. Boman. M. F. Griffen. G. W. Stewart. Direction of Arrival and the Rank-Revealing. December 1991.
In many practical direction-of-arrival (DOA) problems the number of sources and their directions from an antenna array do not remain stationary. Hence a practical DOA algorithm must be able to track changes with a minimal number of snapshots. In this paper we describe DOA algorithms, based on a new decomposition, that are not expensive to compute or difficult to update. The algorithms are compared with algorithms based on the singular value decomposition (SVD). (Also cross-referenced as UMIACS-TR-91-166) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Complexity, Decidability and Undecidability Results. Kutluhan Erol. Dana S. Nau. V.S. Subrahmanian. December 1991.
In this paper, we examine how the complexity of domain-independent planning with STRIPS-like operators depends on the nature of the planning operators. We show conditions under which plannning is decidable and undecidable. Our results on this topic solve an open problem posed by Chapman [4], and clear up some difficulties with his undecidability theorems. For those cases where planning is decidable, we show how the time complexity varies depending on a wide variety of conditions: . whether or not function symbols are allowed; . whether or not delete lists are a]]owed; . whether or not negative preconditions are allowed; . whether or not the predicates are restricted to be propositional(i.e., 0-ary); . whether the planning operators are given as part of the input to the planning prob]em, or instead are fixed in advance. (Also cross-referenced as UMIACS-TR-91-154) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Can Parallel Algorithms Enhance Serial Implementation?. Uzi Vishkin. October 1993.
Consider the serial emulation of a parallel algorithm. The thesis presented in this paper is rather broad. It suggests that such a serial emulation has the potential advantage of running on a serial machine faster than a standard serial algorithm for the same problem. The main concrete observation is very simple: just before the serial emulation of a round of the parallel algorithm begins, the whole list of memory addresses needed during this round is readily available; and, we can start fetching all these addresses from secondary memories at this time. This permits prefetching the data that will be needed in the next "time window", perhaps by means of pipelining; these data will then be ready at the fast memories when requested by the CPU. The possibility of distributing memory addresses (or memory fetch units) at random over memory modules, as has been proposed in the context of implementing the parallel-random-access machine (PRAM) design space, is discussed. This work also suggests that a multi-stage effort to build a parallel machine may start with "parallel memories" and serial processing, deferring parallel processing to a later stage. The general approach has the following advantage: a user-friendly parallel programming language can be used already in its first stage. This is in contrast to a practice of compromising user-friendliness of parallel computer interfaces (i.e., parallel programming languages), and may offer a way for alleviating a so-called "parallel software crisis". It is too early to reach conclusions regarding the significance of the thesis of this paper. Preliminary experimental results with respect to the fundamental and practical problem of constructing suffix trees indicate that drastic improvements in running time might be possible. Serious attempts to follow it up are needed to determine its usefulness. Parts of this paper are intentionally written in an informal way, suppressing issues that will have to be resolved in the context of a concrete implementation. The intention is to stimulate debate and provoke suggestions and other specific approaches. Validity of our thesis would imply that a standard computer science curriculum, which prepares young graduates for a professional career of over forty years, will have to include the topic of parallel algorithms irrespective of whether (or when) parallel processing will succeed serial processing in the general purpose computing market. (Also cross-referenced as UMIACS-TR-91-145.1) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Richard Chimera. October 1991.
Value Bars: an information visualization and navigation tool for. The "value bar" provides an overview of a large listing with multiple, quantifiable attributes. Value bars are a cross between scrollbars and space efficient stacked bar charts. A space-filling algorithm assigns relatively sized regions in the value bar according to items' attribute values. In a glance users can discern the distribution of attribute values of the entire listing. Navigation features provide quick identification and in-context fisheye views of listing items. Many value bars can be created to compare distributions of the same items over different attributes. A usability study showed value bars are easy to use and understand. Value bars can be added to applications involving directory listings, databases and their search results, tables of contents, stock market tables, medical information, etc. (Also cross-referenced as CAR-TR-589) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Catherine Plaisant. Andrew Sears. September 1992.
Touchscreen Interfaces for flexible alphanumeric data entry. Touchscreens have been demonstrated as useful for many applications. Although a traditional mechanical keyboard is the device of choice when entering alphanumeric data, it may not be optimal when only limited data must be entered, or when the keyboard layout, character set, or size may be changed. A series of experiments has demonstrated the usability of touchscreen keyboards. The first study indicated that users who type 58 wpm on a traditional keyboard can type 25 wpm using a touchscreen and that the traditional monitor position is suboptimal for touchscreen use. A second study reported on typing rates for keyboards of various sizes (from 6.8 to 24.6 cm wide). Novices typed approximately 10 wpm on the smallest and 20 wpm on the largest of the keyboards. Users experienced with touchscreen keyboards typed 21wpm on the smallest and 32 wpm on the largest. We then report on a recent study done with more representative users and more difficult tasks. Thirteen cashiers were recruited for this study and were required to complete ten trials in which they typed names and addresses with punctuation. Results indicate that the users improved rapidly from 9.5 wpm on the first trial to 13.8 wpm on the last trial, reaching their fastest performance after only 25 minutes. Although custom interfaces will be preferred for special types of data (e.g. telephone numbers, times, dates, colors) there will always be situations when limited quantities of text must be entered. In these situations a touchscreen keyboard can be used. (Also cross-referenced as CAR-TR-585) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Christopher Ahlberg. Christopher Williamson. Ben Shneiderman. September 1991.
Dynamic Queries for Information Exploration:. We designed, implemented and evaluated a new concept for direct manipulation of databases, called dynamic queries, that allows users to formulate queries with graphical widgets, such as sliders. By providing a graphical visualization of the database and search results, users can find trends and exceptions easily. Eighteen undergraduate chemistry students performed statistically significantly faster using a dynamic queries interface compared to two interfaces both providing form fill-in as input method, one with graphical visualization output and one with all-textual output. The interfaces were used to expore the periodic table of elements and search on their properties. (Also cross-referenced as CAR-TR-584) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Ben Shneiderman. July 1991.
Visual User Interfaces for Information Exploration. The next generation of database management, directory browsing, information retrieval, hypermedia, scientific data management, and library systems can enable convenient exploration of growing information spaces by a wider range of users. User interface designers can provide more powerful search techniques, more comprehensible query facilities, better presentation methods, and smoother integration of technology with task. This paper offers novel graphical and direct manipulation approaches to query formulation and information presentation/manipulation. These approaches include a graphical approach to restricted boolean query formulation based on generalization/aggregation hierarchies, a filter/flow metaphor for complete boolean expressions, dynamic query methods with continuous visual presentation of results as the query is changed (possibly employing parallel computation), and color-coded 2-dimensional space-filling tree-maps that present multiple-level hierarchies in a single display (hundreds of directories and more than a thousand files can be seen at once). (Also cross-referenced as CAR-TR-577) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. Perturbation Theory for Rectangular Matrix Pencils. July 1991; Revised, March 1993.
The theory of eigenvalues and eigenvectors of rectangular matrix pencils is complicated by the fact that arbitrarily small perturbations of the pencil can cause them disappear. However, there are applications in which the properties of the pencil ensure the existence of eigenvalues and eigenvectors. In this paper it is shown how to develop a perturbation theory for such pencils. (Also cross-referenced as UMIACS-TR-91-105) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
William J. Weiland. Ben Shneiderman. July 1991.
A Graphical Query Interface Based on Aggregation/Generalization Hierarchies. In order for automated information systems to be used effectively, they must be made easily accessible to a wide range of users and with short training periods. This work proposes a method of organizing documents based on the concepts of aggregation and generalization hierarchies. We propose a graphical user interface to provide a more intuitive form of Boolean query. This design is based on mapping the nodes of the aggregation hierarchy to Boolean intersection operations, mapping the nodes of the generalization hierarchy to Boolean union operations, and providing a concrete, graphical, manipulable representation of both of these node types. Finally, a working prototype interface was constructed and evaluated experimentally against a classical command-line Boolean query interface. In this formative evaluation with sixteen subjects, the graphical interface produced less than one-tenth the errors of the textual interface, on average. Significant differences in time spent specifying queries were not found. Observations and comments provide guidance for designers. (Also cross-referenced as CAR-TR-562) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. Error Analysis of QR Updating with Exponential Windowing. May 1991.
To appear in Mathematics of Computation Exponential windowing is a widely used technique for suppressing the effects of old data as new data is added to a matrix. Specifically, given an $n\times p$ matrix $X_n$ and a ``forgetting factor'' $\beta\in(0,1)$, one works with the matrix $\dia(\beta^{n-1},\beta^{n-2},\ldots,1)X_n$. In this paper we examine an updating algorithm for computing the QR factorization of $\dia(\beta^{n-1},\beta^{n-2},\ldots,1)X_n$ and show that it is unconditionally stable in the presence of rounding errors. (Also cross-referenced as UMIACS-TR-91-79) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Andrew Sears. Doreen Revis. Janet Swatski. Rob Crittenden. Ben Shneiderman. April 1991.
Investigating Touchscreen Typing: the Effect of Keyboard Size on. Two studies investigated the effect keyboard size has on typing speed and error rates for touchscreen keyboards using the lift-off strategy. A cursor appeared when users touched the screen and a key was selected when they lifted their finger from the screen. Four keyboard sizes were investigated ranging from 24.6 cm to 6.8 cm wide. Results indicate that novices can type approximately 10 words per minute (WPM) on the smallest keyboard and 20 WPM on the largest. Experienced users improved to 21 WPM on the smallest keyboard and 32 WPM on the largest. These results indicate that, although slower, small touchscreen keyboards can be used for limited data entry when the presence of a regular keyboard is not practical. Applications include portable pocket-sized or palmtop computers, messaging systems, and personal information resources. Results also suggest the increased importance of experience on these smaller keyboards. Research directions are suggested. (Also cross-referenced as CAR-TR-553) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Brian Johnson. Ben Shneiderman. April 1991.
Tree-Maps: A Space Filling Approach to the Visualization of. This paper describes a novel method for the visualization of hierarchically structured information. The Tree-Map visualization technique makes 100% use of theavailable display space, mapping the full hierarchy onto a rectangular region in a space-filling manner. This efficient use of space allows very large hierarchies to be displayed in their entirety and facilitates the presentation of semantic information. (Also cross-referenced as CAR-TR-93-72) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
R. Keil-Slawik. Catherine Plaisant. Ben Shneiderman. April 1991.
Remote direct manipulation:A case study of a. This paper describes our experience with the design of a remote pathologistÔs workstation. We illustrate how our effort to apply direct manipulation principles led us to explore remote direct manipulation designs. The use of computer and communication systems to operate devices remotely introduces new challenges for users and designers. In addition to the usual concerns, the activation delays, reduced feedback, and increased potential for breakdowns mean that designers must be especially careful and creative. The user interface design is closely linked to the total system design. (Also cross-referenced as CAR-TR-551) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Rodrigo A. Botafogo. Ben Shneiderman. April 1991.
Identifying Aggregates in Hypertext Structures. Hypertext systems are being used in many applications because of their flexible structure and the great browsing freedom they give to diverse communities of users. However, this same freedom and flexibility is the cause of one of its main problem: the Òlost in hyperspaceÓ problem. One reason for the complexity of hypertext databases is the large number of nodes and links that compose them. To simplify this structure we propose that nodes and links be clustered forming more abstract structures. An abstraction is the concealment of all but relevant properties from an object or concept. One type of abstraction is called an aggregate. An aggregate is a set of distinct concepts that taken together form a more abstract concept. For example, two legs, a trunk, two arms and a head can be aggregate together in a single higher level object called a Òbody.Ó In this paper we will study the hypertext structure, i.e., the way nodes are linked to each other in order to find aggregates in hypertext databases. Two graph theoretical algorithms will be used: biconnected components and strongly connected components. (Also cross-referenced as CAR-TR-550) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
Ben Shneiderman. March 1991.
Tree Visualization with Tree-maps: A 2-d space-filling approach. This paper presents a novel approach to representing trees that have weights or sizes on the leaf nodes. The 2-d visualization is space filling and the recursive algorithm for generation runs rapidly. It depends on color coding (or shading) of regions and easily provides users with a quick overview that clearly indicates relative sizes of the leaf nodes. Figures 3 & 4 show examples of tree-maps with size coding, as implemented by Brian Johnson on a Apple Macintosh II computer with a high resolution color display. Figure 3 shows fifteen files in four directories at three levels, with nested boxes to show the levels. Figure 4 represents actual disk directories encompassing 850 files at four levels with color coding by file type (text, graphics, applications, etc). We continue to explore refinements of tree-maps such as alternate layouts, better methods for coping with large ranges of file size, color coding schemes, and operations applied to files. (Also cross-referenced as CAR-TR-548) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. Lanczos and Linear Systems. March 1991.
Lanczos's major contributions to the numerical solution of linear equations are contained in two papers: ``An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators'' and ``Solutions of Linear Equations by Minimized Iterations,'' the second of which contains the method of conjugate gradients. In this note we retrace Lanczos's journey from Krylov sequences to conjugate gradients. (Also cross-referenced as UMIACS-TR-91-47) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. Adams. M. F. Griffin. G. W. Stewart. Direction-of-Arrival Estimation Using the Rank-Revealing. March 1991.
Appeared in Proceedings of ACASSP-91. An algorithm for updating the null space of a matrix is described. The algorithm is based on a new decomposition, called the URV decomposition, which can be updated in $O(N^2)$ and serves as an intermediary between the QR decomposition and the singular value decomposition. The URV decomposition is applied to a high-resolution direction of arrival problem based on the MUSIC algorithm. A virtue of the updating algorithm is the running estimate of rank. Additional files are available via anonymous ftp at: thales.cs.umd.edu in the directory pub/reports (Also cross-referenced as UMIACS-TR-91-46) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Richard Chimera. Ben Shneiderman. September 1993.
An Exploratory Evaluation of Three Interfaces for. Three different interfaces were used to browse a large (1296 items) table of contents. A fully expanded stable interface, expand/contract interface, and multi-pane interface were studied in a between-groups experiment with 41 novice participants. Nine timed fact retrieval tasks were performed, each task is analyzed and discussed separately. We found that both the expand/contract and multi-pane interfaces produced significantly faster times than the stable interface for many tasks using this large hierarchy; other advantages of the expand/contract and multi-pane interfaces over the stable interface are discussed. The animation characteristics of the expand/contract interface appear to play a major role. Refinements to the multi-pane and expand/contract interfaces are suggested. A predictive model for measuring navigation effort of each interface is presented. (Also cross-referenced as CAR-TR-539) Human Computer Interaction Laboratory, Institute for Systems Research, Dept. of Computer Science, Univ. of Maryland,
Christine R. Hofmeister. Joanne Atlee. James M. Purtilo. Writing Distributed Programs in Polylith. December 1990.
Polylith is a software interconnection system that allows programmers to configure applications from mixed-language software components (modules), and then execute those applications in diverse environments. In general, communication between components can be implemented with TCP/IP or XNS protocols in a network; via shared memory between light-weight threads on a tightly coupled multiprocessor; using custom-hardware channels between processors; or using simply a 'branch' instruction within the same process space. Flexibility in how components are interconnected is made possible by a 'software bus' organization. This document serves as a manual for programmers who wish to use one particular software busÑthe TCP/IP-based network bus. (Also cross-referenced as UMIACS-TR-90-149) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Rodrigo Botafogo. Ehud Rivlin. Ben Shneiderman. December 1990.
Structural analysis of hypertexts: identifying hierarchies and useful. In hypertext databases users often suffer from the well known problem of getting "lost in hyperspace." An approach to solve this problem consists of improving authoring. This paper proposes several authoring tools, based on database structure analysis. In many hypertext systems authors are encouraged to create hierarchical structures, but when writing, the hierarchy is lost because of the inclusion of cross-reference links. The first part of this paper will look at ways of recovering lost hierarchies and finding new ones, offering authors different views of the same database. The second part helps authors by identifying properties of the database. Multilple metrics are developed: among them the compactness and stratum. The compactness indicates the intrinsic complexity of the databse and the stratum reveals to what degree the database is organized so that some nodes should be read before others. Several from existing databases are used to illustrate the benefits of each tool. The collection of these tools provides a multifaceted view of the database and should allow authors to identify weaknesses in their database's structure and create better documents which users will be able to traverse more easily. (Also cross-referenced as CAR-TR-526) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland,
Elizabeth L. White. John R. Callahan. James M. Purtilo. The NewYacc User's Manual. November 1990.
This manual introduces NewYacc, a parser generator system built upon the original yacc system within Unix. NewYacc's principal extension is to provide users with a way to associate rewrite rules with individual productions in the language grammar. These rules are used to describe how the parse tree (which is saved in NewYacc but not in original yacc) should be traversed, plus users can easily control what action is performed at each node in the tree during their traversals. This provides users with great leverage in the construction of a variety of source to source translation tools. This manual assumes a general familiarity with original yacc. (Also cross-referenced as UMIACS-TR-90-141) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
Catherine Plaisant. November 1990.
Guide to Opportunities in Volunteer Archaeology. This case study shows how a hypertext system was used in a traveling exhibit of the Smithsonian Institution. The database about archaeology was constructed by a professor and students of the history department of the University of Maryland. Regular updates of the database were made for each new venue of the exhibit. Finally the database was translated into French and automatically rebuilt to be used in Canada. Helpful features of the hypertext system as well as the difficulties encountered are described. System users were observed in the museum and collected usage data was analyzed. (Also cross-referenced as CAR-TR-523) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
John R. Callahan. James M. Purtilo. A Packaging System for Heterogeneous Execution Environments. October 1990.
In many execution environments software components cannot interoperate easily because of difference in their interfaces and implementations. Additional software is often required to integrate such components and implement the interfacing decisions between them. For example, a procedure call across architectures may require extensive software to relocate data and coerce parameters. Even when powerful integration facilities are available, application programs need some additional softwareÑoften called 'stub s'Ñso they can access the available communication media. Interface software can be more expensive to program than other software, since its creation requires knowledge of the machine architectures and communication mechanisms. Moreover, it must be rewritt en whenever components are reused in different configurations. This paper describes a way to automatically generate custom interface software for heterogeneous configurations. Whereas previous research focused on 'stub generation' alone, our approach generates stubs as well as the configuration methods needed to inte grate an application. Using this approach, developers may build support tools that hide the details of how software configurations are 'packaged' into executables. This approach is implemented within the Unix environment in a system called Polygen, which we have used for evaluation and demonstration. (Also cross-referenced as UMIACS-TR-90-127) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland,
G. W. Stewart. Perturbation Theory for the Singular Value Decomposition. September 1990.
The singular value decomposition has a number of applications in digital signal processing. However, the the decomposition must be computed from a matrix consisting of both signal and noise. It is therefore important to be able to assess the effects of the noise on the singular values and singular vectors\,---\,a problem in classical perturbation theory. In this paper we survey the perturbation theory of the singular value decomposition. (Also cross-referenced as UMIACS-TR-90-124) University of Maryland Institute for Advanced Computer Studies, Dept. of Computer Science, Univ. of Maryland, Appeared in SVD and Signal Processing, II, R. J. Vacarro ed., Elsevier, Amsterdam, 1991.
Andrew Sears. March 1991.
Improving Touchscreen Keyboards:. This study explored touchscreen keyboards using high precision touchscreen strategies. Phase one evaluated three possible monitor positions: 30, 45, and 75 degrees from horizontal. Results indicate that the 75 degree angle, approximately the standard monitor position, resulted in more fatigue and lower preference ratings. Phase two collected touch bias and key size data for the 30 degree angle. Subjects consistently touched below targets, and touched to the left of targets on either side of the screen. Using these data, a touchscreen keyboard was designed. Phase three compared this keyboard with a mouse activated keyboard, and the standard QWERTY keyboard for typing relatively short strings of 6, 19, and 44 characters. Results indicate that users can type approximately 25 words per minute with the touchscreen keyboard, compared to 17 WPM using the mouse, and 58 WPM when using the keyboard. Possible improvements to touchscreen keyboards are suggested. (Also cross-referenced as CAR-TR-515) Human Computer Interaction Laboratory, Center for Automation Research, Dept. of Computer Science, Univ. of Maryland, Institute for Systems Research,
G. W. Stewart. G. Zhang. On a Direct Method for the Solution of Nearly Uncoupled Markov Chains. July 1990.
This note is concerned with the accuracy of the solution of nearly uncoupled Markov chains by a direct method based on the LU decomposition. It is shown that plain Gaussian elimination may fail in the presence of rounding errors. A modification of Gaussian elimination with diagonal pivoting as well as corrections of small pivots by sums of off-diagonal elements in the pivoting columns is proposed and analyzed. It is shown that the accuracy of the solution is affected by two condition numbers associate with the aggregate and the coupling respectively. (Also cross-referenced as UMIACS-TR-90-95) University of Marylan