This short guide provides information on how to configure and use Exposé, the SHOE web-crawler. Exposé searches for web pages with SHOE mark-up, reads the knowledge from them, and loads it into a knowledge base. This knowledge can then be queried using any interfaces provided by the knowledge base.
This version of Exposé uses Parka as its knowledge base and will not work without it. It is possible to design variants of Exposé that work with other types of knowledge bases by creating different implementations of the abstract class ShoeKb.KBInterface.
Exposé is provided under the terms of the GNU General Public License. See the file License.html for details.
Before you can start Exposé, you must create a configuration file that specifies a number of parameters. This file should be title init.dat and should have the following format:
FROM_USER=your_email_address KB_HOST=parka_host_name KB_PORT=parka_port_number KBNAME=parka_kb_name START_URL=url ALLOW_URL=url PROHIB_URL=url MAX_PAGES=integer MAX_COST=integer REQUEST_INTERVAL=integer IMPLICIT_CLAIMS=TRUE_or_FALSE
The definitions of these fields are as follows:
To run Exposé, switch to the directory that contains your init.dat and (assuming the Expose directory is accessible via your CLASSPATH) type:
java Expose.ExposeApp
The Exposé window should appear. This interface provides means for dynamically editing some of the parameters that were specified in the init.dat file, particularly START_URL (the starting URL), KBNAME (KB Name), ALLOW_URL (Visit URL Prefixes), and PROHIB_URL (Avoid URL Prefixes).
To create a new KB, verify the parameters from above and press New KB. Exposé will begin to crawl the web pages and will log the progress of the crawl in the window just above the button bar. When the search is done, files named kbname.kb and kbname.pred will be created. These are your Parka assertion and predicate files, and can be used with Parka's ncreateKB command to create a new KB.
The Update KB button will go through the list of sites visited in the previous use of the web-crawler and will check to see if any have changed since the last visit. If so, it will dynamically update the KB with the new information.
The Stop button will pause a crawl. It can then be continued using the Resume button.
The Exit button closes Exposé.