In this project, you are required to implement a simple
system to manage information about events and when they occurred. Any event e has a NAME (alphanumeric string
with blanks and having at least 3 non-blank characters), a START time (of type int)
and an END time (of type int).
Both START and END lie between 0 and 255 (inclusive). For example, we
may have an event named “G7 meeting” with START = 25 and END = 27, indicating
that a G7 meeting starts at time 25 and ends at time 27 (i.e. it lasts for 3
units of time – 25, 26, 27). Though
events usually have many attributes associated with them, you are only required
to deal with the above three fields in this project.
Part 1 (Due: Feb.
14, 2002; 20 points): Write a program called hashrecords
that takes as input a file and creates a hash table called HT based on the
records in the file. hashrecords is
an executable that can be invoked from the command line. The input file
contains one event per line. Thus, such
a file might look like
“G7 meeting” 25 27
“Washington Dog Show” 31
35
“Maryland Elephant
Appreciation Week” 20-26
Note that strings are enclosed in quotes and blanks are
used to separate the fields. New records start on a new line. Hashrecords must read this file as
input and must create as output, a simple hash table HT based solely on
the NAME field of each record. HT has
50 buckets numbered 0 through 49, each having capacity 3. Each bucket entry contains the NAME and the
line number in this file. For example,
the entry about the “G7 meeting” in the above example includes not only the
string “G7 Meeting” but also the line number 1 indicating that this references
the event in Line 1 of the file. It is
possible that the same event name occurs multiple times (for example, there
could be several G7 meetings held at different points in time). When your program is done, it should print
out the contents of all buckets in the following (example) format:
Bucket 0:
2 “Washington Dog Show”
3 “Maryland Elephant Appreciation Week”
Bucket 1:
1 “G7 meeting”
Empty buckets need not be printed out. The hashing function you use must compute h(S)
as follows – let the weight of a number be the number itself, and the weight of
a letter be its position in the alphabet (i.e. weight of A=1, weight of B=2,etc.). Look at the first two non-blank symbols in
the string S and take the sum of their squares and compute the mod
of that w.r.t. 50. For example, suppose
we wish to find h(“G7 Meeting”).
G is the 7th letter of the alphabet, and so has weight
7. The number 7 has weight 7. The sum of their squares is 98. 98 mod 50 is
48, so “G7 Meeting” would hash to bucket 48. When attempting to insert into a full bucket, execute a program
called overflow. For now, all
this program needs to do is to print an error message (Part II will involve
fleshing out this routine). The error
message should specify the Bucket Number and the event name that caused an
overflow.
Part 2 (Due: Feb.
21, 2002, 15 points): Your second
assignment is to replace the overflow program in Part 1 by a program
that does double hashing. This program
works as follows. Whenever you are inserting S into a bucket that is full, the program must
rehash S using a secondary hash function g. g(S) is defined to be the modulo
w.r.t 50 of the square of the weight of
the third alphanumeric element of string S (blanks occurring in S
are ignored). The double hashing program looks at the following buckets:
P0 = (h(S) + 0*g(S)) mod 50;
P1 = (h(S) + 1*g(S)) mod 50;
…
Pj = (h(S) + j*g(S)) mod 50
It inserts S into the first bucket Pj with available
space.
As in Part I, when your program is done, it should print
out the contents of all non-empty buckets.
In the event an overflow still occurs after executing the above double
hashing program, print an error message as before. Though your overflow program is intended to be invoked
from within the hashrecords program, you should ensure that it can also
be directly invoked from the command line.
Part 3 (Due: March 5, 2002, 10 points): Your third task is to write a program called
findevent (command line executable) that takes as input, an input file,
an existing hash table HT for that file, and an event string. It returns as output, all events (together
with their START and END times) that exactly match the input event string. Your program must use the hash table
and hash functions in finding the event – any other search method is unacceptable.
Part 4 (Due: March 19, 2002, 25 points): Part 4 of the project involves creating a
segment tree index for the START and END times. You are required to write a program called mktimeindex (command
line executable) which takes as input, a file of the same kind as in Part 1 but
this time it only looks at the START and END time fields. It creates (from scratch) a segment tree
based on the time intervals in the input file.
Each node in the segment tree must contain a LOW and HIGH field (of type
int in both cases) as well as a list of line numbers that are
applicable. Information about segment
trees will be provided in class. You
may assume that all START and END times are less than 256. When your program finished, it should print
out a preorder traversal of all nodes in the tree. For each node N, print the parent of node N (if one exists, else
print “None”), its LOW and HIGH values, and the line numbers associated with
node N. For example, a sample print out
would be:
Node 1
Parent: None
Low: 0
High: 256
*******
Node 2
Parent: Node 1
Low: 0
High: 128
*******
Part 5 (Due: April 4, 2002, 20 points): Write an algorithm called intervalquery (command
line executable) that takes as input two integers A,B and a file name. It uses
the segment tree index tree associated with the specified file to find all
records r in the file such that
[A,B] is a subinterval (or equal) to the interval [r.START,r.END]. Using the
example file of Part I, when A=33 and B=35, the record involving the “Maryland
Dog Show” should be returned. (In the event the segment tree associated with
the file does not exist, you should use mktimeindex to create it).
Part 6 (Due: April 11, 2002, 10 points): Write an algorithm called intersectquery
(executable from the command line) that takes as input two integers A,B and a
file. It uses the segment tree index tree associated with the specified file to
find all records r in the file
such that [A,B] intersects the interval [r.START,r.END]. Using the example file of Part I, when A=30
and B=33, the record involving the “Maryland Dog Show” should be returned. (In the event the segment tree associated
with the file does not exist, you should use mktimeindex to create it).
Part 7 EXTRA CREDIT (Due: April 25, 2002, 20 points): Build a graphical user interface using which
you may execute all the above operations, i.e. creating a hash index for a
file, creating a segment tree index for the file, evaluating findevent,
intervalquery and intersectquery.