CMSC 828T Fall 2008 Hanan Samet Tu 3:30-6PM (and also possibly on some Thursdays at the 3:30-6PM time slot in order to accommodate classes that are missed by the Professor at the agreement of the class) Title: Applications of GPU and Cloud Computing to Nontraditional Databases, Computer Graphics, Data Mining, and GIS Abstract: The computing landscape is changing rapidly. The cost of special purpose processors such as GPU and Cell processors that are capable of performing mathematical operations at a higher throughput than general purpose processors has dropped dramatically due to the dynamics of the scale. This is sparking a new revival of parallel and distributed computational techniques which is evident in the rapid increase in the number of ``cores'' in computers, the number of Synergistic Processing Elements (SPEs) in cell processors, and the number of parallel pipelines in GPUs (some of which have up to 512 processors). There is a push towards ``cloud computing'' which enables massive distributed systems to be built by stringing together several cheap commodity machines as exemplified by Google's Map-Reduce and Microsoft's Dryad cloud computing frameworks. This push has implications for researchers who work at the crossroads of theory and practice. In this class, we will investigate the development of algorithms that can take advantage of this changing landscape where computation is cheap and plentiful, and storage is practically limitless. The focus of the class is threefold: a) To understand, analyze, and dissect modern computing paradigms. This involves studying emerging computing trends, programming paradigms, and gaining exposure to some new emerging technologies. b) To understand how to retool existing data structures and algorithms to take advantage of these new computing paradigms. c) To examine a number of applications in the area of spatial and text databases, computer graphics, data mining, and GIS and see how they can be addressed in the context of these new computing paradigms. Students will also undertake a project on a topic of their choice that involves one of the above application areas. Students will have an opportunity to use the NVIDIA S870 Deskside supercomputer containing 512 parallel GPU pipelines that Prof. Samet's group has recently acquired and will make available to the class. The class will consist of lectures to provide a background on the data structure and algorithm issues involved in the various applications as well as student presentations on some of the topics and on their projects. Some of the issues that we will address in the class are as follows. Note that this tabulation is preliminary and an attempt will be made to tailor the topics and emphasis to match the interests and background of the students enrolled in the class. A. New Paradigm in Computing a. Emerging trend in data and data processing b. GPU and the CUDA programming paradigm c. Map-Reduce d. Dryad e. Cell Processors (Sony PS3) B. Basic algorithms in parallel and distributed programming C. Parallel and Distributed Data Structures D. New Approaches to Databases a. Massive Object Sorting b. Simple database operations c. Similarity Search d. Data Mining and Clustering e. Join Operations f. Nearest Neighbor Search g. and more ... E. New Approaches to Computer Graphics a. Geometric operations on GPU b. Distributed Graphics c. Operations on massive point-clouds d. and more.. F. New approaches to GIS a. Spatio-textual Query Processing b. Query Processing on Spatial Networks G. New approaches to Data Mining H. Enhancements to existing programming frameworks Workload: A class presentation about a selection of papers about one of the above topics chosen in consultation with the instructor as well as a written description of a course project coupled with a possible short oral presentation describing the work. The course does not satisfy the 7 Ph.D. or MS comps qualifying course requirements but does satisfy the additional "two 3-credit 600-800 level courses" requirements for a Ph.D. Prerequisites: CMSC 420 or equivalent. Recommended Text: H. Samet, Foundations of Multidimensional and Metric Data Structures, Morgan-Kaufmann, an imprint of Elsevier, San Francisco, 2006.