Overview
For this homework you will implement an utility that allow us to find
images present in a collection of web sites.
Objectives
This project is designed to help you practice:
- Threads and synchronization
- Networking
- Text parsing
Grading
- (55%) Tests (all public)
- (25%) Correct use of threads to process urls
- (20%) Using a shared collection to store results
Clarifications
Any clarifications or corrections associated with this project will be
available at: Clarifications
Code Distribution
The project's code distribution is available by checking out the project
named ImageFinder. The code distribution provides you with
the following:
- imageFinder package → Where you will provide your
implementation
- tests package
Specifications
For this homework you will implement a system that allow us to find urls
of images present in a collection of web sites. In order to find these images
we will use the static method Utilities.findImages (see code distribution)
which takes a set of web sites and returns a set of urls that corresponds to
images found (if any) in the specified sites.
To recognize images, your system will search through the html code of the
specified web page, looking for entries starting with "<img src=" where
any number of spaces and options (e.g., border) may exist in between img and
src. An image is represented by the string following "src=". The following is
a representative example of one possible entry you will be searching for:
<img
src="http://www.cs.umd.edu/class/fall2009/cmsc132/homeworks/ImageFinder/documents/Set1/Set1a.jpg" />
The findImages method will return complete urls of images found. A
complete url is defined as one that starts with "http://" and which provides
the exact location of the image in such a way that we can cut and past the
url in a browser and actually see the image. For this project you don't have
to worry about sites that may use uppercase letters for img or src in the
html code.
Requirements
- You may not use regular expressions to identify images in a web
page.
- Your system must create a thread for each web site in such a way that
the process of searching for images in all the web sites takes place
concurrently. This is an important part of your implementation. You will
lose credit if you limit the concurrency of your system or just have a
sequential execution of threads.
- Threads in charge of finding images must place the found images in a
common collection (e.g., set). This is a shared resource and there can
only be one collection that is used by all threads. Make sure you provide
the appropriate synchronization.
- Feel free to add any classes, methods, interfaces you understand are
needed. Just make sure you place your implementation in the imageFinder
package.
- You do not need to provide any documentation for classes, methods,
interfaces you provide.
- You must attempt to submit your project immediately after checking out
the project (even if you have not implemented any methods). This will
allow you to verify that the submission process is working as expected.
- You should submit your project often. This will keep versions of your
project in the submit server that are easy to retrieve (you can also get
previous versions from your CVS repository). If your computer crashes or
you experience any other problem you will have a permanent backup in the
submit server.
- No student tests are required for this project but you are encourage to
develop them as you implement this programming assignment.
- Style
- Good variable names.
- You must avoid code duplication by calling appropriate methods
(rather than cutting and pasting code). You may define your own
private utility methods to perform often repeated tasks.
- Style as defined by the Eclipse Format Element option (Source
→ Format → Format Element) or as specified in Code
Conventions for the JavaTM Programming Language (focus on the
following sections: Indentation, Declarations, Statements, White
Space, and Naming Conventions).
- Although you should avoid source lines exceeding 80 characters, you
will not be penalize if they are present in your code.