CMCS 818S

Tues/Thurs. 3:30 p.m. - 4:45 p.m.

Tools and Techniques for Very Large

Scale Data Intensive Applications


 

Dr. Joel Saltz

       

Dr. Alan Sussman

  • vmail: 301-405-2669

A.V. Williams Building

Room 4155 

University of Maryland

  • Click here to finger
     
  • vmail: 301-405-3360

A.V. Williams Building

Room 4145 

University of Maryland

  • Click here to finger

 

  Who to contact for a hard copy of papers:

We are located in A.V. Williams Building, room 4157  

Class Location

 

J.M. Patterson Building, 

Room 1109 

Bldg. Number: 083  

Location: Northeast Quad D-2 

 

Please Note:

There will be a room change as of February 3rd. We will meet in the UMIACS Conference Room,

located in the A.V. Williams Building, room 2120.

 

Tools and Techniques for Very Large Scale Data Intensive Applications

The course is a survey of database systems that target sensor, scientific and statistical applications as well as systems used for data cube and data mining analyses. We will evaluate database architectures for their ability to efficiently support data access and computational requirements posed by high end applications. The course will cover algorithmic, systems, API and user interface issues. We will also carry out a targeted survey of data intensive applications and their requirements for database support to provide the applications context for evaluating high end database technology.

 

Dates

Presentations

Wk 1:  

1/24  

1/26


Administrative, assign papers 

T2 slides, Titan slides, Virtual Microscope slides

 

Chialin Chang

January 26

Chang, C., Acharya, A., Sussman, A., and Saltz, J. T2: A Customizable Parallel Database for Multi-dimensional Data. Technical Report CS-TR-3867 and UMIACS-TR-98-04, University of Maryland, Department of Computer Science and UMI ACS, January (1998). To appear in ACM SIGMOD Record, March 1998. ftp://hpsl.cs.umd.edu/pub/papers/ADR-tr.ps.Z.

Chialin Chang

January 26

Chang, C., Moon, B., Acharya, A., Carter S., Sussman, A., and Saltz, J. Titan: A High Performance Remote-Sensing Database. Proceedings of the 1997 International Conference on Data Engineering, 375--384, April 1997. ftp://hpsl.cs.umd.edu/pub/papers/icde97-final.ps.Z

Renato Ferreira

January 26

Ferreira, R., Moon, B., Humphries, J., Sussman, A., Saltz, J., Miller, R., and Demarzo, A. The Virtual Microscope. In Proceedings of the 1997 AMIA Annual Fall Symposium, 449-453. American Medical Informatics Association, October 1997. ftp://hpsl.cs.umd.edu/pub/papers/amia97.ps.Z.

   

Wk 2 & 3:  

2/3 

2/5 

2/10 

2/12

Earth Science and Medical Databases

Client--Server Paradise and Geo-Spatial DBMS slides

SEQUOIA slides and Sequoia Benchmark

BigSur slides and RasDaMan slides

Postgres slides

 

Mike Beynon

February 3

  1. DeWitt, D., Kabra, N. Luo, J. Patel, Yu, J. Client--Server Paradise. In Proceedings of the 20th VLDB Conference, 558--569.Morgan Kaufmann Publishers, Inc., 1994. http://www.cs.wisc.edu/paradise/paradise.papers.html
  1. Patel, J., Yu, J., Kabra, N., Tufte, K., Nag, B., Burger, J., Hall, N., Ramasamy, K., Lueder, R., Ellmann, C., Kupsch, J., Johan Larson, S. G., DeWitt, D., and Naughton, J. Building a Scalable Geo-Spatial DBMS: Technology, Implementation, and Evaluation. In Proceedings of SIGMOD'97, 336-347. ACM Press, 1997.

http://www.cs.wisc.edu/paradise/paradise.papers.html

Hubert Tsang

February 5

  1. Stonebraker, M. The SEQUIOIA 2000 Project. IEEE Data Engineering Bulletin 16(1), 24-28. March 1993.

ftp://ftp.research.microsoft.com/research/pub/debull/mar93-letfinal.ps.

  1. Stonebraker, M. Sequoia 2000: A Reflection on the First Three Years. EECS Dept., University of California, Berkeley Technical report: S2K-94-58, 1994. 

ftp://s2k-ftp.CS.Berkeley.EDU/pub/postgres/papers/S2K-94-58.pdf 

ftp://s2k-ftp.CS.Berkeley.EDU/pub/postgres/papers/S2K-94-58.ps.Z

  1. Stonebraker, M., Frew, J., Gardels, K., and Meredith, J. The Sequoia 2000 Storage Benchmark. Proceedings of the ACM SIGMOD International Conference on Management of Data. May 1993.

http://s2k-ftp.CS.Berkeley.EDU:8000/sequoia/tech-reports/s2k-92-12/s2k-92-12.ps.Z

Norina Dixon

February 10

  1. Brown, P. and Stonebraker, M. BigSur: A System for the Management of Earth Science Data. In Proceedings of the 21st VLDB Conference, 720-728. Morgan Kaufmann Publishers, Inc., 1995.  

ftp://s2k-ftp.CS.Berkeley.EDU/pub/postgres/papers/S2K-95-65.pdf 

ftp://s2k-ftp.CS.Berkeley.EDU/pub/postgres/papers/S2K-95-65.ps.Z

  1. Baumann, P., Furtado, P., and Ritsch, R. Geo/Environmental and Medical Data Management in the RasDaMan System. In Proceedings of the 23rd VLDB Conference, 548-552. Morgan Kaufmann Publishers, Inc., 1997.

http://SunSite.Informatik.RWTH-Aachen.DE/dblp/db/conf/vldb/vldb97.html

John Davis

February 12

  1. Stonebraker, M., Rowe, L. A., and Hirohama, M. The Implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering, pages 125-142. Volume 2, Number 1, March 1990.

http://s2k-ftp.cs.berkeley.edu:8000/postgres/papers/ERL-M90-34.pdf

http://s2k-ftp.cs.berkeley.edu:8000/postgres/papers/ERL-M90-34.ps.Z

  1. Stonebraker, M. and Kemnitz, G. The POSTGRES Next Generation DBMS. Communications of the ACM, pages 78-92, Volume 34, Number 10, October 1991.

    http://s2k-ftp.CS.Berkeley.EDU:8000/postgres/papers/ERL-M91-62.pdf

    http://s2k-ftp.CS.Berkeley.EDU:8000/postgres/papers/ERL-M91-62.ps.Z

 

Wk 4: 

2/17 

2/19

Parallel Database Systems and Query Optimization

Parallel Database Systems slides

Query Evaluation slides 1

Query Evaluation slides 2

 

Jerome Brown

February 17

DeWitt, D. and Gray, J. Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM, 35(6), 85--98. June 1992. 

Nonetta Pierre

February 19

Graefe, G. Query Evaluation Techniques for Large Databases. ACM Computing Surveys 25(2), 73-170. June 1993.

file://ftp.cs.pdx.edu/pub/faculty/graefe/papers/qeval.survey.ps

 

Wk 5: 

2/24

  

2/26

Tertiary Storage

Tertiary Storage Slides

Tertiary Memory Slides

ADSM slides

Query slides

 

Yuan-Shin Hwang

February 24

Prabhakar S., Agrawal D., Abbadi A.E., and Singh A. Tertiary Storage: Current Status and Future Trends. Computer Science Department, University of California, Santa Barbara TRCS96-21, August 1996.http://www.cs.ucsb.edu/TRs/techreports/TRCS96-21.ps

Renato Ferreira

February 24

Sarawagi, S. and Stonebraker, M. Reordering Query Execution in Tertiary Memory Databases. In Proceedings of the 22nd VLDB Conference,156--167, Morgan Kaufmann Publishers, Inc. 1996. http://SunSite.Informatik.RWTH-Aachen.DE/dblp/db/conf/vldb/SarawagiS96.html

Renato Ferreira

February 26

Cabrera, L.-F., Rees, R., and Hineman, W. Applying Database Technology in the ADSM Mass Storage System. In Proceedings of the 21st VLDB Conference, 597-605. Morgan Kaufmann Publishers, Inc., 1995. 

Renato Ferreira

February 26

Yu, J. and DeWitt, D. J. Query Pre-Execution and Batching in Paradise: A Two-Pronged Appraoach to the Efficient Processing of Queries on Tape-Resident Data Sets. In 9th International Conference on Scientific and Statistical Database Management (SSDBM '97). IEEE Computer Society Press, 1997. http://www.cs.wisc.edu/paradise/paradise.papers.html.

 

Wk 6:  

3/3  

3/5

Client-Server

Client-Server Query slides

Semantic Data Caching slides

 

Rob Bennett

March 3

March 5



  1. Franklin, M., Jónsson, B. T., Kossmann, D. Performance Tradeoffs for Client-Server Query Processing, Proceedings of SIGMOD'96,149--160. ACM Press, 1996. http://www.cs.umd.edu/projects/dimsum/papers/sigmod96.ps.gz
  1. Dar, S., Franklin, M. J., Jonsson, B. T., Srivastava, D., and Tan, M. Semantic Data Caching and Replacement. In Proceedings of the 22nd VLDB Conference, 330-341. Morgan Kaufmann Publishers, Inc., 1996. http://SunSite.Informatik.RWTH-Aachen.DE/dblp/db/conf/vldb/DarFJST96.html

 

Wk 7:  

3/10

  

3/12



Object-Relational Database Systems

Of Objects and Database slides and Enhanced Abstract Data Types slides

On-line Analytical Processing (OLAP)

Group-By, Cross-Tab, and Sub-Totals slides

 

Asmara Afework

March 10

  1. Carey, M. and DeWitt, D. Of Objects and Databases: A Decade of Turmoil. In Proceedings of the 22nd {VLDB} Conference, 3-14, Morgan Kaufmann Publishers, Inc., 1996. http://www.informatik.uni-trier.de/~ley/db/conf/vldb/CareyD96.html
  1. Seshadri, P., Livny, M., and Ramakrishnan, R. The Case for Enhanced Abstract Data Types. In Proceedings of the 23rd VLDB Conference, 66--75. Morgan Kaufmann Publishers, Inc., 1997. http://SunSite.Informatik.RWTH-Aachen.DE/dblp/db/conf/vldb/vldb97.html
Charlie Chang

March 12

  1. Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. Generalizing Group-By, Cross-Tab, and Sub-Totals. In Proceedings of the 1996 International Conference on Data Engineering, 152--159. IEEE Computer Society Press, 1996. 
  1. Colliat, G. OLAP, Relational, and Multidimensional Database Systems. SIGMOD Record 25(3), 64-69. September 1996. 

 

Wk 8:  

3/17  

3/19

On-line Analytical Processing (OLAP)

OLAP Data and Multidimensional Aggregates slides

Faculty Candidate Talk

 

Henrique Andrade

March 17

  1. Sarawagi, S. Indexing OLAP Data. 20(1), 36--43. March 1997. ftp://ftp.research.microsoft.com/pub/debull/mar97-letfinal.ps
  1. Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J., Ramakrishnan, R., and Sarawagi, S. On the Computation of Multidimensional Aggregates. In Proceedings of the 22nd VLDB Conference, 506--521. Morgan Kaufmann Publishers, Inc., 1996. http://SunSite.Informatik.RWTH-Aachen.DE/dblp/db/conf/vldb/AgarwalADGNRS96.html

Faculty Candidate Talk

March 19

Department Lecture Series

SPRING 1998

Speaker: Amin Vahdat

Affiliation: UC - Berkeley

Location: AVW 3258

Time: 4:00 p.m. Thursday, Mar 19 (Refreshments at 3:30 in AVW 1152)

Title: Operating System Services For Wide-Area Applications

Abstract:

This talk examines system support issues for wide-area applications given the opportunity posed by remotely programmable resources. The development of a number of compelling wide-area applications such as Internet commerce, remote agents, online gaming, and news transmission has helped us identify a common set of application requirements, including: (i) naming of remote, potentially migrating objects, (ii) coherent access to global data, (iii) safe execution of remote programs, and (iv) secure, authenticated access to global resources. Unfortunately, today such system support is implemented in an ad-hoc and application-specific manner.

This talk describes some of the difficulties of developing wide-area applications and describes the design and implementation of WebOS, a unified set of system services designed to simplify application development and to more efficiently utilize wide-area resources. One demonstration of WebOS functionality is Rent-A-Server, a system that allows any Web server to dynamically replicate itself across the wide area in response to client access patterns.

 

Wk 9:  

3/24

3/26

spring break!

 

Wk 10:  

3/31

4/2





Parallel Mining slides
Shamik Sharma

March 31

  1. Agrawal, R. and Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases, In Proceedings of the 20th VLDB Conference, 487--499, Morgan Kaufmann Publishers, Inc., September 1994. http://www.almaden.ibm.com/cs/quest/publications.html
  2. Brin, S., Motwani, R., and Ullman, J. D., and Tsur, S. Dynamic Itemset Counting and Implication Rules for Market Basket Data, In Proceedings of SIGMOD'97, 255-264, ACM Press, May 1997. http://www-db.stanford.edu/midas/publication.html
Mustafa Uysal

April 2

  1. Agrawal, R.and Shafer, J.C. Parallel Mining of Association Rules: Design, Implementation and Experience, IEEE Transactions on Knowledge and Data Engineering, 962-969, Volume 8, Number 6, 1996. http://www.almaden.ibm.com/cs/quest/publications.html
  2. Zaki, M. J., Parthasarathy, S., and Li, W. A Localized Algorithm for Parallel Association Mining, In Proceedings of SPAA'97, 321-330, ACM Press, June 1997.

 

Wk 11: 

4/7  

4/9



OLAP

Queries slides

DB2 slides

 

Asmara Afework

April 7

  1. Han, J., Stefanovic, N., and Krzysztof Koperski Selective Materialization: An Efficient Method for Spatial Data Cube Construction In Proceedings of the 1998 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'98), Melbourne, Australia, April 1998. http://db.cs.sfu.ca/sections/publication/smmdb/smmdb.html
  2. Deshpande, P.M., Naughton, J.F., Ramasamy, K., Shukla, A., Tufte, K., and Yihong Zhao Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP IEEE Data Engineering Bulletin, 20(1), 3-11, March 1997. http://www.research.microsoft.com/research/db/debull/
Norina Dixon

April 9

  1. Hellerstein, J.M. Optimization Techniques for Queries with Expensive Methods ACM Transactions on Database Systems, 23(1), March 1998. http://epoch.cs.berkeley.edu:8000/personal/jmh/
  2. Hellerstein, J.M. and Stonebraker, M.Predicate Migration: Optimizing Queries with Expensive Predicates In Proceedings of SIGMOD'93, 267--276, ACM Press, May 1993, (Superseded by HELLERSTEIN98).
  3. Jhingran, A., Malkemus, T., and Padmanabhan, S. Query Optimization in DB2 Parallel Edition IEEE Data Engineering Bulletin, 20(2), 27-34, June 1997. http://www.research.microsoft.com/research/db/debull/

 

Wk 12:

4/14  

4/16  

Indexing

B-tree and R-tree slides

R-tree Access slides

R-tree Processing slides

 

Henrique Andrade

April 14

  1. Comer, D. The Ubiquitous B--Tree ACM Computing Surveys. 11(2), 121-137, June 1979.
  2. Guttman, A. R-Trees: A Dynamic Index Structure for Spatial Searching In Proceedings of SIGMOD'84, ACM Press, 47-57, May 1984.
Jerome Brown

April 16

  1. Beckmann, N. Kriegel, H-P, Schneider, R., and Seeger, B. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles In Proceedings of SIGMOD'90, ACM Press, 322--331, May 1990.
  2. Brinkhoff, T. Kriegel, H-P, and Seeger, B. Efficient Processing of Spatial Joins Using R-trees In Proceedings of SIGMOD'93, ACM Press, 237--246, May 1993.

 

Wk 13:  

4/21  

4/23



Faculty Candidate Speaker (room 2460)

Systems

Persistent Applications slides

Prototyping Bubba slides

 

Anthony Tomasic

April 23

Department of Computer Science Colloquium

Speaker: Anthony Tomasic (INRIA Rocquencourt & Dyade)

[Anthony Tomasic is a faculty candidate in the CS department]

Date: Tuesday, April 21

Time: 3:30pm, reception following at 4:30pm

Room: AV Williams 2460

Title: Parachute Queries

Abstract:

Mediator systems (aka heterogeneous databases) are used today in a wide variety of unreliable environments. When processing a query, a mediator may try to access a data source which is unavailable. In this situation, existing systems suffer from an Achilles' heel -- they typically either silently ignore unavailable data sources or generate an error. In either case, to obtain the complete answer, the query is reprocessed from scratch. This behavior is inefficient in environments with a non-negligible probability that a data source is unavailable (e.g., the Internet). In the case that some data sources are unavailable, the complete answer to a query cannot be obtained; however useful work can be done with the available data sources. In this talk, after some suitable marketing, we describe a novel approach to mediator query processing where, in the presence of unavailable data sources, the answer to a query is a `partial answer.' The partial answer represents the state of the mediator at the end of query processing, i.e., materialized data. This state is used to construct an `incremental query.' The answer to the incremental query is the same as the complete answer, but it is more efficient to evaluate than the original query. In addition, information can be extracted from the mediator state through the use of secondary queries, called `parachute queries.' We define two new architectures for partial answers, incremental and parachute queries and analyze several properties of these architectures. Our analysis shows that parachute queries can be viably added to existing mediator systems.

Joint work with Philippe Bonnet (Bull Inc. & Dyade)

Nonetta Pierre

April 23

  1. Carey, M.J. DeWitt, D.J., Franklin. M.J., Hall, N.E., McAuliffe, M.L., Naughton. J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., White, S.J., and Zwilling, M.J. Shoring Up Persistent Applications In Proceedings of SIGMOD'94, ACM Press, 383--394, May 1994. http://www.cs.wisc.edu/Dienst/Repository/2.0/Body/ncstrl.uwmadison/CS-TR-94-1222/postcript
  2. Boral, H., Alexander, W., Clay, L., Copeland, G., Danforth, S., Franklin. M., Hart, B., Smith, M., and Valduriez, P. Prototyping Bubba: A Highly Parallel Database System IEEE Transactions on Knowledge and Data Engineering, 2(1), 4-24, March 1990.

 

Wk 14:  

4/28  

4/30



Multimedia Systems

 

Leana Golubchik

April 28

Gemmell, D. J., Vin, H. M., and Kandlur, D. D., Rangan, P. V., and Rowe, L. A. Multimedia Storage Servers: A Tutorial IEEE Computer, 28(5), 40--49, May 1995. http://www.research.microsoft.com/research/BARC/JGemmell/computer95.ps

Mike Franklin

April 30

  1. Franklin, M. and Zdonik, S. Data in Your Face: Push Technology In Perspective (Invited Paper) In Proceedings of SIGMOD'98, ACM Press, June 1998, http://www.cs.umd.edu/projects/bdisk/inyourface.ps
  2. Amsaleg, L., hilippe Bonnet, P., Franklin, M., Tomasic, A., and Urhan, T. Improving Responsiveness for Wide-Area Data Access IEEE Data Engineering Bulletin, 20(3), September 1997 http://www.cs.umd.edu/users/franklin/debull/amsaleg.ps

 

Wk 15:  

5/5  

5/7



Multimedia Systems

Video Databases and Query by Image slides

Systems

Log slides

 

John Davis

May 5

  1. Wolf, W. and Yu, H. Subject-based Retrieval for Film-and TV Program-oriented Image and Video Databases
  2. Kelly, P. M., Cannon, T. M., Hush, D. R. Query by Image Example: the {CANDID} Approach Proceedings of the SPIE, Storage and Retrieval for Image and Video Databases III, Vol. 2420, 238--248, 1995.

 

Mustafa Uysal

May 7

  1. Rosenblum, M. and Ousterhout, J.K. The Design and Implementation of a Log-Structured File System ACM Transactions on Computer Systems, 10(1), 27--52, February 1992.

  2. Seltzer, M., Smith, K., Balakrishnan, H., Chang, J., McMains, S., and Padmanabhan, V. File System Logging versus Clustering: A Performance Comparison In Proceedings of the 1995 Usenix Technical Conference, 1995. http://www.eecs.harvard.edu/~margo/papers

 

Wk 16: 

5/12  

Last Day