. . . Back to The SHOE Home Page

S  H  O  E :  Simple HTML Ontology Extensions

Note: This is not the latest SHOE specification.

SHOE 0.98

Proposed Specification
Sean Luke
SHOE Project
January 10, 1997


Latest version of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec.html
Version 0.97 of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec0.97.html
Version 0.96 of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec0.96.html
Version 0.95 of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec0.95.html
Version 0.90 of this document: http://www.cs.umd.edu/projects/plus/SHOE/spec0.9.html

Table Of Contents

1 Introduction

This specification describes SHOE, an extension to HTML which provides a way to incorporate machine-readable semantic knowledge in HTML or other World-Wide Web documents. This specification describes:

  • A hierarchical classification mechanism for HTML documents (and optionally non-HTML documents) or subsections of HTML documents.
  • A mechanism for specifying relationships between classified elements and other classified elements or specific kinds of data (numbers, dates, etc.)
  • An simple way to specify ontologies containing rules that define valid classification and relationships.

Further, a proposed addition to the specification provides a mechanism for adding simple semantics in the form of simplified Horn clauses.

The intent of this specification is to make it possible for user-agents, robots, etc., to gather truly meaningful information about web pages and documents, enabling significantly better search mechanisms and knowledge-gathering.

The general way one goes about this is as follows:

  • First, define an ontology describing valid classifications of web objects, and valid relationships between web objects and other web objects or data. This ontology may borrow from other ontologies.
  • Annotate HTML pages to describe themselves, other pages, or subsections of themselves, as having attributes as described in one or more ontologies.

We're playing a bit fast-and-loose with the term ontology here. In this specification, ``ontology'' simply means an ISA hierarchy of classes/categories, plus a set of atomic relations between these categories. Categories inherit relations defined for parent categories. The proposed addition to the specification defines more sophisticated inferential relationships.

User agents following this specification should be aware that assertions made by HTML pages are not facts, but claims. I.e., if element x claims that element y is related with relation r to element z, then the user-agent should not be entering r(y,z) into its database (i.e., "Now I know that y is related to z with the relationship r!!"). Instead, it should be entering something along the lines of r(x,y,z) into its database (i.e., "x is claiming that y is related to z with relationship r."). This is an important distinction: it's perfectly fine for HTML pages out there to be making completely false claims; one shouldn't simply accept them as truth. For similar reasons, HTML pages can only make assertions, not retractions.

1.1 Extensions to HTML

SHOE adds the following tags to the HTML standard:

ONTOLOGY, /ONTOLOGY, ONTOLOGY-EXTENDS, ONTDEF, INSTANCE, /INSTANCE, USE-ONTOLOGY, CATEGORY, RELATION, ATTRIBUTE and /ATTRIBUTE. Additionally, SHOE declares the META HTTP-EQUIV tag "Instance-Key".

The proposed addition to the specification adds the tags /ONTDEF, ONTIF, and ONTTHEN.

2 Terms

Terms not described here may be found in the HTML 2.0 specification.

Category
An element under which HTML page instances or subinstances can be classified. Category names are element names, and may be prefixed. Categories may have parent categories. Categories define inheritance: if an instance is classified under a category, keys classified with this category may fill argument positions in relations defined for that category or any of its parent (or ancestor) categories. Multiple inheritance is valid.
Data
Data which can be placed in an argument of a relationship but is not an instance. Data must be of the following types:

Strings (STRING)
HTML String Literals, as defined in the HTML 2.0 specification.
Numbers (NUMBER)
Floating-point numerical constants. Knowledge-agents should be able to read common floating-point numbers like 2, 2.0, -1.432e+4, etc. Numbers may be of the form,
0|
([+|-|]
['.'digit*|0'.'digit*|non-zero-digit digit*['.'digit*|]]
[([e|E][+|-]non-zero-digit digit*)])

Dates (DATE)
Date/Timestamps following RFC 1123, as shown in section 3.3.1 of the HTTP/1.0 specification.
Booleans (TRUTH)
HTML String Literals of the form YES or NO, case-insensitive.
Categories (CATEGORY)
Category names.
Relationships (RELATION)
Relation names.

Element
A category or relationship name, or one of the following reserved keywords (all caps): STRING, NUMBER, DATE, TRUTH, CATEGORY, or RELATION. Element names are case-sensitive, and may contain only letters, digits, or hyphens.
Instance
An element which may be classified under zero or more categories, and included as an argument to relationships (along with other forms of data). Some instances, page instances, are associated with World-Wide Web documents. All page instances are automatically of the category Page. Other instances, subinstances, are associated with subsections of HTML page instance documents. Subinstances are automatically of the category PageSubinstance, and have a parentPage relationship linking them to their parent page instance. Instances form the most common data entities in databases built up from this specification.
Key
A string which uniquely defines a page instance or a subinstance. It is up to you to decide on the keys for your documents. For page instances of SHOE-conformant documents, the proper method is to use a single absolute URL for the document.. For example, http://www.cs.umd.edu is a valid key for the document located at that URL.

To create keys for subinstances, add to the page instance's unique key a pound-suffix such as #MyDog. For example, http://www.cs.umd.edu#MyDog is a valid key for a subinstance located at http://www.cs.umd.edu. It's good style for this unique key to correspond with an actual anchor in the document.

The unique key of a non-SHOE-conformant document is defined to be one particular absolute URL of the document, chosen for the document by a SHOE-conformant document which references it.

Ontology
As defined in this specification, a description of valid classifications for HTML page instances and subinstances, and valid relationships between instances and elements.
Prefix
A small string attached with a period at the beginning of an instance, category, or relation name. For example,cs is a prefix in cs.junk. Prefixes may also be attached to already-prefixed elements, forming a prefix chain. For example, foo.bar.cs is a prefix chain for foo.bar.cs.junk. A prefix indicates the ontology from which the element (or prefixed element) following it is defined.
Relation (Relationship)
An element which defines a relationship between elements. Relation names are element names, and may be prefixed. Elements fill one or more arguments to a relation. Arguments are explicitly ordered, so each has a numbered position (the first is argument 1). Many relations are binary (have exactly two arguments). A binary relation's domain is argument 1 of the relation. A relation's range (the element the relation is ``to'') is argument 2 of the relation.
Rule
A formal rule in an ontology defining valid classifications (categories) or valid relationships that can be asserted.
Unique Name
A string which uniquely defines an ontology. Unique names are different from keys in that they do not uniquely define instances but rather the ontologies which the instances may use. Different versions of an ontology may have the same unique name so long as they have different version numbers.
Version (Version Number)
A string which describes the version of an ontology. Versions are case-sensitive, and may contain only letters, digits, or hyphens.

3 Declaring Ontologies

Except as specified, all declarations must be made in the BODY section of an HTML document.

3.1 Declaring An Ontology Definition

An HTML document may contain any number of ontology definitions. Each ontology definition should use a unique name. Ontology definitions are accompanied with a version number. If an ontology completely subsumes previous versions of the same ontology (it contains all the rules defined in those versions), it may declare itself to be backward-compatible with those versions. To begin an ontology definition, use:

<ONTOLOGY "ontology-unique-name"
	VERSION="version"
	[BACKWARD-COMPATIBLE-WITH="version list"]
	[DESCRIPTION="text"]>

``ontology-unique-name'' (mandatory)
The ontology's unique name.
VERSION (mandatory)
The ontology's version.
BACKWARD-COMPATIBLE-WITH
A whitespace-delimited list of previous versions which this ontology subsumes.
DESCRIPTION
A short, human-readable description of the purpose of the ontology.

To end an ontology definition, use:

</ONTOLOGY>


All rules and extensions in an ontology must appear between the beginning and ending declarations. Ontologies may not be nested or overlap.

3.2 Extending An Existing Ontology

An ontology may be declared to extend one or more existing ontologies. This means that it will use elements in those ontologies in its own rules. To distinguish between those elements and its own elements, an ontology must provide a unique prefix for each ontology it extends. This will be prefixed to elements borrowed from each particular ontology whenever they are referred to. To declare that an ontology is extending another ontology, use:

<ONTOLOGY-EXTENDS "ontology-unique-name" 
 	VERSION="version"
	PREFIX="prefix"
 	[URL="URL"]>

``ontology-unique-name'' (mandatory)
The extended ontology's unique name.
VERSION (mandatory)
The extended ontology's version.
PREFIX (mandatory)
The prefix you are assigning the extended ontology. All categories and relations from the extended ontology which are used in your ontology must be prefixed with this prefix. Within an HTML document, a prefix must be different from all prefixes declared with either <USE-ONTOLOGY ...> or <ONTOLOGY-EXTENDS ...> tags.
URL
A URL that points to a document which contains the extended ontology.

3.3 Declaring Classification Rules

Inside an ontology definition, an ontology may declare various new categories which instances can belong to. Categories should descend from one or more parent categories. To declare a new category, or to add new parent categories for a category, use:

<ONTDEF CATEGORY="category-name"
	[ISA="parent-category-list"]
	[DESCRIPTION="text"]>

CATEGORY (mandatory)
The newly declared category, or the one being given more parent categories. Newly declared categories should be distinct from all other categories and relationships declared in the ontology.
ISA
A whitespace-delimited list of categories to define as parent categories of this category.
DESCRIPTION
A short, human-readable description of the category's semantics.

A particular category should not be defined more than once within an ontology's declaration.

3.4 Declaring Relationship Rules

Inside an ontology definition, an ontology may declare various new valid relationships between category instances or between category instances and data. To declare a relationship, use:

<ONTDEF RELATION="relation-name" 
	ARGS="element-list"
	[DESCRIPTION="text"]>

RELATION (mandatory)
The newly declared relationship name. This should be distinct from all other categories and relationships declared in the ontology.
ARGS (mandatory)
The arguments of the relation. This should be a whitespace-delimited list of (commonly two) elements, representing the arguments to the relation. Elements can be either declared categories, or the following keywords (all caps): STRING, NUMBER, DATE, TRUTH, CATEGORY, RELATION.

CATEGORY establishes a relationship not with category instances but with categories themselves. RELATION establishes a relationship not with instances but with other relationships. These last two elements are rare and should only be used in special circumstances.

DESCRIPTION
A short, human-readable description of the relationship's semantics.

A particular named relationship should not be defined more than once within an ontology's declaration.

3.5 Renaming Rules

To reduce the number of prefixes, an ontology may rename a category or relation (plus its prefix chain) to a simpler name, so long as this name is not used in any other category or relation in the ontology. For example, an ontology could rename the category cs.junk.foo.person to simply person, so long as person is not defined elsewhere in the ontology.

Ontologies are not permitted to rename (or rename elements to) the following keywords: STRING, NUMBER, DATE, TRUTH, CATEGORY, or RELATION. To rename a category or relation, use:

<ONTDEF RENAME="element-name" 
	TO="new-element-name">

RENAME (mandatory)
The element's old name.
TO (mandatory)
The element's new name.

4 Marking Up HTML Documents Using Ontologies

Except as specified, all declarations must be made in the BODY section of an HTML document.

4.1 Declaring a Page Instance

SHOE-conformant HTML documents must declare themselves page instances and provide a unique key for themselves. To declare an HTML document to be a page instance, add the following text to the HEAD section of the document:

<META HTTP-EQUIV="Instance-Key"
	CONTENT="key">

``key'' The page instance's unique key.

4.2 Declaring a Subinstance

A document may declare zero or more subinstances. Subinstances may not overlap or be nested in each other. To declare the start of a subinstance, use:

<INSTANCE "key">

``key'' (mandatory)
The unique key for the instance.

To mark the end of the section of a subinstance, use:

</INSTANCE>


All relationship and category declarations made within a subinstance belong to that subinstance. All relationship and category declarations made outside a subinstance belong to the page instance.

4.3 Declaring Ontology Usage

Before you can classify documents or establish relationships between them, you'll need to define exactly which ontologies these classifications and relations are derived from, and associate with each of these ontologies some prefix unique to that ontology. An HTML document may declare that is using as many ontologies as it likes, as long as each ontology has a unique prefix in the document. To declare that a page instance and all its subinstances use a particular ontology, use:

<USE-ONTOLOGY "ontology-unique-name" 
	VERSION="version"
	PREFIX="prefix" 
	[URL="URL"]>

``ontology-unique-name'' (mandatory)
The ontology's unique name.
VERSION (mandatory)
The ontology's version.
PREFIX (mandatory)
The prefix you are assigning the ontology. All categories and relations from this ontology which are used in this document must be prefixed with this prefix. Within this document, the prefix must be different from all prefixes declared with either <USE-ONTOLOGY ...> or <ONTOLOGY-EXTENDS ...> tags.
URL
A URL that points to a document which contains the used ontology.

4.4 Declaring Categories

Instances may be classified, that is, they may be declared to belong to one or more categories in an ontology, using the CATEGORY tag:

<CATEGORY "prefixed.category"
	[FOR="key"]>

``prefixed.category'' (mandatory)
A category with full prefix chains showing a path through used and extended ontologies back to the ontology in which it was defined.
FOR
Contains the key of the instance which is being declared to belong the category. If FOR is not declared, then the key is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance. If FOR is declared, then it provides the key.

4.5 Explicitly Declaring Relationships

Instances may declare relationships with elements. To explicitly declare relationships between elements, use:

<RELATION "prefixed.relationship" 
    [[FROM="key"] [TO="key"]] |
    [[1="key"] [2="key"]
     [3="key"] [4="key"]
     [5="key"] etc... ] >

``prefixed.relationship'' (mandatory)
A relationship with full prefix chains showing a path through used and extended ontologies back to the ontology in which it was defined.
1, 2, 3, 4, 5, 6, 7, ...
Declares the element in argument position indicated by the tag. For example, 7="George" declares that "George" is argument 7 in the relation. This element must be of the type declared for that particular argument position.
FROM
Synonymous with 1. A relation declared with both FROM and any 1, 2, 3, 4 ... tags is invalid.
TO
Synonymous with 2. A relation declared with both TO and any 1, 2, 3, 4 ... tags is invalid.

Explicit declarations take two forms, the "old" form and the "new" form.

The old form is only valid when the relationship being declared is binary, that is, that it has only two arguments. In this form, FROM and TO may be used. If a tag (FROM or TO) is not declared, the element type for that tag must be INSTANCE, and the key for the instance is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance. If the tag is declared, and the type of the tag's argument is an instance, then it provides the key.

The new form may be used for relationship of any number of arguments (including binary relationships). In this form, the 1, 2, 3, 4 ... tags may be used. If a numbered tag is not declared, and an argument of that number position exists in the relationship, then the information should be considered "unknown". Note that this is different from the assumptions in the old form.

4.6 Marking Up Text Relationships

It's possible to wrap existing HTML body text and declare it to be a relationship. This is done by:

<ATTRIBUTE "prefixed.relationship" 
	attribute text
</ATTRIBUTE>

This is functionally the same as declaring:

<RELATION "prefixed.relationship" 
 	TO="attribute text">