============================================================================================================ Part 2 - Due Oct 6, 11:59 PM (Total points 60: 8 each for problems 16 to 20; 10 each for problems 21 and 22) ============================================================================================================ 16. Count the size of two-hop neighbrhood for the user 'Paul Hall'. This includes all of his friends, and the friends of his friends. Be careful and make sure that your query does not count the same user twice. === 16 # This would be simplified if the friends table contained entries # like ('user0', 'user0'). with temp as ( ( select users.userid, f2.userid2 from users, friends f1, friends f2 where users.name like 'Paul Hall%' and users.userid = f1.userid1 and f1.userid2 = f2.userid1 ) union ( select users.userid, f1.userid2 from users, friends f1 where users.name like 'Paul Hall%' and users.userid = f1.userid1 ) ) select count(*) - 1 from temp; # The '-1' is required because otherwise we would count Paul Hall as # his own connection within two hops. -------------------------------------------------------------------------------------- 17. For each user, report the time in days since there last status update. If the user has no recorded status updates, then the corresponding answer should be -1. Note that, such users do not appear in the status table. Order the result by the userid. === 17 ( select userid, trunc(current_date - max(status_time)) from status group by userid ) union ( select userid, -1 from users where userid not in (select userid from status) ) order by userid; -------------------------------------------------------------------------------------- 18. Find the names of the 5 users with the largest number of friends. === 18 # A user is in the answer if there are less than 5 users with fewer friends than him. with temp as ( select u.userid, name, count(userid2) as num_friends from users u, friends f where u.userid = f.userid1 group by userid, name ) select t1.name, t1.num_friends from temp t1 where 5 > (select count(*) from temp t2 where t2.num_friends > t1.num_friends ); -------------------------------------------------------------------------------------- 19. Group memberships and friendships tend to be correlated. For a specific group "g", let "n" denote the number of its members. There are a possible "n(n-1)/2" friendship links between these "n" members (counting each friendship link only once). Let "m" denote the actual number of links that exist among the members. We will call m/(n(n-1)/2) the "cohesiveness ratio". Write a query (or a set of queries) to find the cohesiveness ratio for all the groups. Order the result by groupid. === 19 # m1.userid < m2.usreid makes sure we don't double count the friendship links with temp1 as ( select m1.groupid, count(*) as m from members m1, members m2, friends f where m1.groupid = m2.groupid and f.userid1 = m1.userid and f.userid2 = m2.userid and m1.userid < m2.userid group by m1.groupid ), temp2 as ( select m.groupid, count(*) as n from members m group by m.groupid ) select temp1.groupid, 2*m/(n*(n-1)) from temp1, temp2 where temp1.groupid = temp2.groupid order by temp1.groupid; -------------------------------------------------------------------------------------- 20. Find the two users that were born closest to each other (there are no two users with the same birthdate). === 20 # This is not a scalable query if the number of users is large since the # temp table may become very large. Using PL/SQL (see Question 22) is # probably a better idea. with temp as ( select u1.name name1, u2.name name2, u1.birthdate - u2.birthdate as diff from users u1, users u2 where u1.userid > u2.userid ) select name1, name2 from temp where abs(diff) = (select min(abs(diff)) from temp); -------------------------------------------------------------------------------------- 21. [Trigger] Create a new table: NumberOfStatusUpdates(userid, num_updates) using the Status table. Write a "trigger" to keep this new table updated when a new entry is inserted into or a row is deleted from the Status table. Remember the userid corresponding to the new status update may not exist in the NumberOfStatusUpdates. Similarly, if deletion of a status update from the Status table results in a userid not having any correspondings status update, then the tuple in NumberOfStatusUpdates should be deleted. === 21 drop table NumberOfStatusUpdates; create table NumberOfStatusUpdates as select userid, count(*) as num_updates from status group by userid; create or replace trigger UpdateNumberUpdates after insert or delete on Status referencing new as newRow old as oldRow for each row declare current_count number; begin select max(num_updates) into current_count from NumberOfStatusUpdates where userid = :newRow.userid; if INSERTING then if current_count = 0 then insert into NumberOfStatusUpdates values(:newRow.userid, 1); else update NumberOfStatusUpdates set num_updates = num_updates + 1 where userid = :newRow.userid; end if; elsif DELETING then if current_count = 1 then delete from NumberOfStatusUpdates where userid = :oldRow.userid; else update NumberOfStatusUpdates set num_updates = num_updates - 1 where userid = :oldRow.userid; end if; end if; end UpdateNumberUpdates; / ### Check if any errors. show errors trigger UpdateNumberUpdates; ### Try out some posibilities to make sure the trigger worked. select * from NumberOfStatusUpdates where userid = 'user37'; insert into status values('user37', current_date, 'dummy status update'); select * from NumberOfStatusUpdates where userid = 'user37'; select * from NumberOfStatusUpdates where userid = 'user33'; delete from status where userid = 'user33'; select * from NumberOfStatusUpdates where userid = 'user33'; -------------------------------------------------------------------------------------- 22. [User Defined Functions] In Question 11, you had to write a query to find the average time between status updates of a user (more generally a similar query should work to find the average time between status updates across all users). That can be done using a fairly simple 3-4 line query. However, computing "variance" is very tricky. To define more formally: let there be four status updates: (-, t1), (-, t2), (-, t3), (-, t4). Assuming t1 < t2 < t3 < t4, the times between consecutive updates are: x1 = t2 - t1, x2 = t3 - t2, x3 = t4 - t3. The average time between updates is: mu = (x1 + x2 + x3)/3 The variance is: sqrt( ((x1 - mu)^2 + (x2 - mu)^2 + (x3 - mu)^2) / n ) Write a query to find the variance using PL/SQL. More specifically, see: http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_packages.htm#ADFNS009 Write an anonymous block that can be directly executed in SQL*PLUS. Don't forget to use "SET SERVEROUTPUT ON;" when you attempt to execute this using SQL*PLUS. === 22 declare ts date; prev date; n number := 0; xsquared float := 0; x float := 0; numseconds float; cursor c1 is select status_time from status order by status_time; begin open c1; fetch c1 into ts; prev := ts; loop fetch c1 into ts; exit when c1%notfound; numseconds := (ts - prev) * 86400; prev := ts; n := n + 1; x := x + numseconds; xsquared := xsquared + numseconds * numseconds; end loop; dbms_output.put_line('Average time between status updates is: '); dbms_output.put_line(x/n); dbms_output.put_line('Variance in the average time between status updates is: '); dbms_output.put_line(sqrt(xsquared/n - (x*x)/(n*n))); end; / --------------------------------------------------------------------------------------