SQL Assignment, CMSC424, Fall 2009 ================================== Instructor: Amol Deshpande *** The assignment is to be done by yourself. *** *** To start with, log in to a grace machine, run "tap oraclient". The SID is "dbclass2". This will initialize the environmental variables. Then run "sqlplus", and log in using the user/password provided in the class. You can now start using it. *** Run ``@ populate-sn.sql'' to create and populate the tables. *** Schema: The dataset contains a synthetic social network dataset (inspired by Facebook etc.). Specifically it contains the following tables: -- Users: userid, name, birthdate, joined -- Groups: groupid, name -- Friends: userid1, userid2 This is a symmetric table. For every entry $(X, Y, d)$, there is a symmetric entry $(Y, X, d)$. -- Status: userid1, status_time, text -- Members: userid, groupid *** The dataset was generated synthetically: the user names, birthdates etc., were randomly generated, and the group names were chosen from a list of universities around the world. The status text is always concatenation of the userid and the status update time, and should not be required in any queries. Only status updates from a period of about 10 days are included in the dataset, with many users having no status updates during that period. *** In many cases (especially for complex queries or queries involving ``max'' or ``min''), you will find it easier to create temporary tables using the ``with'' construct. This also allows you to break down the full query and makes it easier to debug. *** You don't have to use the "hints" if you don't want to; there might be simpler ways to solve the questions. *** You should submit your answers in a single .txt file, called "_sql_assignment.txt". This file should contain all the answers, with each answer enclosed within "=== NUM" and "-----------------" lines. The simplest option would be to copy the provided sql_assignment.txt file, and add the SQL queries/statements in that file. An example is shown below (Question 0). *** You are provided a perl file "eval.pl" for making it easier to test your answers, and also for making sure that your submission conforms to the requirements. 1. Download the perl file and save it in your working directory. 2. Edit the perl file to add your ORACLE username and password. 3. Directly enter your answer in the sql_assignment.txt file as described above. 4. You can run your SQL solution for problem number 4 by executing: perl eval.pl sql_assignment.txt 4 The file already contains the answer for problem 0. Test it by: perl eval.pl sql_assignment.txt 0 NOTE: There are some SQL statements that cannot be executed in this fashion. Answers to questions 1, 2, 15, 21, 22 fall in this category. NOTE 2: Don't forget to do "tap oraclient" before running the perl script, otherwise the environment variables won't be set. 5. To run all your solutions one after the other, execute: perl eval.pl sql_assignment.txt If the query result contains > 5 rows, only first 5 rows are listed. 6. You can use this eval.pl file remotely if you'd like, but you may have to install DBD::Oracle module to your perl installation. *** SUBMISSION INSTRUCTION: You are required to submit your answers in a single file called "_sql_assignment.txt", and another file that contains the output of executing: perl eval.pl _sql_assignment.txt The submission is to be done using the submit server. https://submit.cs.umd.edu In case of problems with the submission site, email the submission to me (amol@cs.umd.edu) and the TA (wa.moustafa@gmail.com). ============================================================================================== Part 1 - Due Sept 29, 11:59 PM (Total points 60: 4 each) ============================================================================================== 0. List all user names and their birthdates; order by the user name. === 0 select name, birthdate from users order by name; ------------------------------------------------------------------------ 1. Write a query to change your password. Ofcourse when submitting the answer, don't write the new password (write "****" instead). === 1 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 2. Figure out how to find out all the attributes and their types for a given table in SQL*Plus, and write down the statement and its result on the Olympics table. This kind of a statement is usually unique to the client, and is not part of standard SQL. === 2 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 3. Report the user information (userid, name, birthdate, joined) for all users with first name 'Karen'. (Hint: use ``like''). === 3 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 4. Write a single query to report all status updates for the user 'Paul Hall'. === 4 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 5. Cound the number of the friends of the user 'Paul Hall'. === 5 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 6. Create a new table called "numfriends(name, num)" with the list of user names and for each user name, the number of his or her friends. Hint: Use "group by" to count the number of friends. === 6 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 7. Write a query to find the name(s) of the user(s) with the largest number of friends by first creating a temporary table using the "WITH" construct. === 7 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 8. Count the total number of users whose names start with a vowel ('A', 'E', 'I', 'O', 'U'). (Hint: Use "in" and "substr"). === 8 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 9. List the names of the users who posted no status updates. Hint: Use "not in". === 9 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 10. Create a list of users and their friends, such that the friend has an upcoming birthday within next 15 days. Assume today is Sept 15, 2009 (so look for birthdays between Sept 16 and Sept 30). You can hardcode that if you'd like. Hint: Use "extract" function that operates on the dates. === 10 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 11. For each user who has posted at least two status updates, count the average amount of time between his or her status updates in seconds. Order the results in the increasing order by the userid. Hint: Date substraction returns the amount in (fractional) number of days. Hint 2: The number of seconds in a day is 86400. === 11 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 12. Create a view (call it Seventies) that contains just the names of the users who were born in the 70's (1970 to 1979). === 12 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 13. Generate a list - (birthyear, num-users) - containing the years in which the users in the database were born, and the number of users born in each year. === 13 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 14. Find the name of the group with the maximum number of members. === 14 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 15. Write a query to add a new column called ``age'' to the Users table (it should be defined integer). Initially the ``age'' column would be listed as empty. Write a query to ``update'' the table to set it appropriately. Assume each year has exactly 365.24 days. This will make it easier to answer the question, but may count the age for some user incorrectly. Hint: The current date can be obtained by using the keyword current_date. Hint 2: When you subtract two dates, the result is the number of days (possibly fractional) between the two dates. === 15 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- ============================================================================================================ Part 2 - Due Oct 6, 11:59 PM (Total points 60: 8 each for problems 16 to 20; 10 each for problems 21 and 22) ============================================================================================================ 16. Count the size of two-hop neighbrhood for the user 'Paul Hall'. This includes all of his friends, and the friends of his friends. Be careful and make sure that your query does not count the same user twice. === 16 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 17. For each user, report the time in days since there last status update. If the user has no recorded status updates, then the corresponding answer should be -1. Note that, such users do not appear in the status table. Order the result by the userid. === 17 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 18. Find the names of the 5 users with the largest number of friends. === 18 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 19. Group memberships and friendships tend to be correlated. For a specific group "g", let "n" denote the number of its members. There are a possible "n(n-1)/2" friendship links between these "n" members (counting each friendship link only once). Let "m" denote the actual number of links that exist among the members. We will call m/(n(n-1)/2) the "cohesiveness ratio". Write a query (or a set of queries) to find the cohesiveness ratio for all the groups. Order the result by groupid. === 19 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 20. Find the two users that were born closest to each other (there are no two users with the same birthdate). === 20 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 21. [Trigger] Create a new table: NumberOfStatusUpdates(userid, num_updates) using the Status table. Write a "trigger" to keep this new table updated when a new entry is inserted into or a row is deleted from the Status table. Remember the userid corresponding to the new status update may not exist in the NumberOfStatusUpdates. Similarly, if deletion of a status update from the Status table results in a userid not having any correspondings status update, then the tuple in NumberOfStatusUpdates should be deleted. === 21 [[ Your SQL Solution Here ]] -------------------------------------------------------------------------------------- 22. [User Defined Functions] In Question 11, you had to write a query to find the average time between status updates of a user (more generally a similar query should work to find the average time between status updates across all users). That can be done using a fairly simple 3-4 line query. However, computing "variance" is very tricky. To define more formally: let there be four status updates: (-, t1), (-, t2), (-, t3), (-, t4). Assuming t1 < t2 < t3 < t4, the times between consecutive updates are: x1 = t2 - t1, x2 = t3 - t2, x3 = t4 - t3. The average time between updates is: mu = (x1 + x2 + x3)/3 The variance is: sqrt( ((x1 - mu)^2 + (x2 - mu)^2 + (x3 - mu)^2) / n ) Write a query to find the variance using PL/SQL. More specifically, see: http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_packages.htm#ADFNS009 Write an anonymous block that can be directly executed in SQL*PLUS. Don't forget to use "SET SERVEROUTPUT ON;" when you attempt to execute this using SQL*PLUS. === 22 [[ Your SQL Solution Here ]] --------------------------------------------------------------------------------------