In this project, you are required to implement a simple system to manage information about events and when they occurred. Any event e has a NAME (alphanumeric string with blanks and having at least 3 non-blank characters), a START time (of type int) and an END time (of type int). Both START and END lie between 0 and 255 (inclusive). For example, we may have an event named “G7 meeting” with START = 25 and END = 27, indicating that a G7 meeting starts at time 25 and ends at time 27 (i.e. it lasts for 3 units of time – 25, 26, 27). Though events usually have many attributes associated with them, you are only required to deal with the above three fields in this project.
Part 1 (Due: Feb. 14, 2002; 20 points): Write a program called hashrecords that takes as input a file and creates a hash table called HT based on the records in the file. hashrecords is an executable that can be invoked from the command line. The input file contains one event per line. Thus, such a file might look like
Note that strings are enclosed in quotes and blanks are used to separate the fields. New records start on a new line. Hashrecords must read this file as input and must create as output, a simple hash table HT based solely on the NAME field of each record. HT has 50 buckets numbered 0 through 49, each having capacity 3. Each bucket entry contains the NAME and the line number in this file. For example, the entry about the “G7 meeting” in the above example includes not only the string “G7 Meeting” but also the line number 1 indicating that this references the event in Line 1 of the file. It is possible that the same event name occurs multiple times (for example, there could be several G7 meetings held at different points in time). When your program is done, it should print out the contents of all buckets in the following (example) format:
Empty buckets need not be printed out. The hashing function you use must compute h(S) as follows – let the weight of a number be the number itself, and the weight of a letter be its position in the alphabet (i.e. weight of A=1, weight of B=2,etc.). Look at the first two non-blank symbols in the string S and take the sum of their squares and compute the mod of that w.r.t. 50. For example, suppose we wish to find h(“G7 Meeting”). G is the 7th letter of the alphabet, and so has weight 7. The number 7 has weight 7. The sum of their squares is 98. 98 mod 50 is 48, so “G7 Meeting” would hash to bucket 48. When attempting to insert into a full bucket, execute a program called overflow. For now, all this program needs to do is to print an error message (Part II will involve fleshing out this routine). The error message should specify the Bucket Number and the event name that caused an overflow.
Part 2 (Due: Feb. 21, 2002, 15 points): Your second assignment is to replace the overflow program in Part 1 by a program that does double hashing. This program works as follows. Whenever you are inserting S into a bucket that is full, the program must rehash S using a secondary hash function g. g(S) is defined to be the modulo w.r.t 50 of the square of the weight of the third alphanumeric element of string S (blanks occurring in S are ignored). The double hashing program looks at the following buckets:
P0 = (h(S) + 0*g(S)) mod 50;
P1 = (h(S) + 1*g(S)) mod 50;
Pj = (h(S) + j*g(S)) mod 50
It inserts S into the first bucket Pj with available space.
As in Part I, when your program is done, it should print out the contents of all non-empty buckets. In the event an overflow still occurs after executing the above double hashing program, print an error message as before. Though your overflow program is intended to be invoked from within the hashrecords program, you should ensure that it can also be directly invoked from the command line.
Part 3 (Due: March 5, 2002, 10 points): Your third task is to write a program called findevent (command line executable) that takes as input, an input file, an existing hash table HT for that file, and an event string. It returns as output, all events (together with their START and END times) that exactly match the input event string. Your program must use the hash table and hash functions in finding the event – any other search method is unacceptable.
Part 4 (Due: March 19, 2002, 25 points): Part 4 of the project involves creating a segment tree index for the START and END times. You are required to write a program called mktimeindex (command line executable) which takes as input, a file of the same kind as in Part 1 but this time it only looks at the START and END time fields. It creates (from scratch) a segment tree based on the time intervals in the input file. Each node in the segment tree must contain a LOW and HIGH field (of type int in both cases) as well as a list of line numbers that are applicable. Information about segment trees will be provided in class. You may assume that all START and END times are less than 256. When your program finished, it should print out a preorder traversal of all nodes in the tree. For each node N, print the parent of node N (if one exists, else print “None”), its LOW and HIGH values, and the line numbers associated with node N. For example, a sample print out would be:
Part 5 (Due: April 4, 2002, 20 points): Write an algorithm called intervalquery (command line executable) that takes as input two integers A,B and a file name. It uses the segment tree index tree associated with the specified file to find all records r in the file such that [A,B] is a subinterval (or equal) to the interval [r.START,r.END]. Using the example file of Part I, when A=33 and B=35, the record involving the “Maryland Dog Show” should be returned. (In the event the segment tree associated with the file does not exist, you should use mktimeindex to create it).
Part 6 (Due: April 11, 2002, 10 points): Write an algorithm called intersectquery (executable from the command line) that takes as input two integers A,B and a file. It uses the segment tree index tree associated with the specified file to find all records r in the file such that [A,B] intersects the interval [r.START,r.END]. Using the example file of Part I, when A=30 and B=33, the record involving the “Maryland Dog Show” should be returned. (In the event the segment tree associated with the file does not exist, you should use mktimeindex to create it).
Part 7 EXTRA CREDIT (Due: April 25, 2002, 20 points): Build a graphical user interface using which you may execute all the above operations, i.e. creating a hash index for a file, creating a segment tree index for the file, evaluating findevent, intervalquery and intersectquery.