S H O E : Simple HTML Ontology Extensions

Adding Semantic Knowledge to an HTML Page Using SHOE

Sean Luke
PLUS Group, U Maryland at College Park

Once we have some ontologies to work with (see the tutorial here). We'll embed some semantic knowledge into our web page using SHOE. Let's take the following sample HTML page as an example.



<HTML>
<HEAD>
<TITLE> My Page </TITLE>
</HEAD>
<BODY>
<P> Hi, this is my web page.
    I am a graduate student and a research assistant.
<P> Also, I'm 52 years old.
<P> My name is George Stephanopolous.
<P> Here is a pointer to my <A 
  HREF="http://www.cs.umd.edu/smith"> graduate advisor.</A>
<P> And <A HREF="http://www.cs.umd.edu/papers/paper.ps">
  is a paper I recently wrote.
<h3> Brun Hilda </h3>
Brun Hilda is a visiting lecturer here from Germany who doesn't have her 
own web page.  However, because I am such a nice person, I have agreed 
to let part of my web page space belong to her.  She is 23.  
</BODY>
</HTML>

This tells us:

That the web page is about a
- graduate student
- research assistant
The person's name is George Stephanopolous.
The person is 52 years old.
The person's graduate advisor is http://www.cs.umd.edu/smith
The person is the author of the paper http://www.cs.umd.edu/papers/paper.ps

Further, we've learned some interesting facts about Brun Hilda:

She's a lecturer.
She is 23 years old.
Her name is Brun Hilda.

It so happens that we want to tell these exact things to intelligent agents and other knowledge-gatherers. To do this, we first need to tell the robot that we're using SHOE and uniquely define our document as an instance. An instance is similar to an "entity" as defined in the database world. However, we don't use the term "entity" because "entity" already in common use in another way in HTML and SGML. We begin by declaring that our page uses SHOE 1.0-compliant tags. To do this, in the HEAD section of our document, we add:



<META HTTP-EQUIV="SHOE" CONTENT="VERSION=1.0">

Instances and Keys

Before we can add semantic information to our web page, we need to define one or more instances, which are data objects which we will classify or relate to one another. It's paramount that instance be unique from one another--we wouldn't want two people writing instances with the same name. SHOE handles this by associating with each instance a unique key. SHOE has a standard protocol for coming up with a key for instances: base them on one (and only one) URL for the web page they're found on. For example, an instance about my dog Fido, found on some web page http://www.example.com/example.html, might have the key "http://www.example.com/example.html#fido". Or Richard Nixon's home page might contain a single instance with just his URL as key: "http://www.whitehouse.gov/trickydick.html". A web page might have many URLs that lead to it, so you'll have to pick which one you'll use as its official key and stick with that. This effectively guarantees that instance on other documents can't have the same keys as ones on your document, since no two documents can share the same URL (unless one went away and the other replaced it).

Using a single unique URL has an additional benefit: no one can pretend to be your document; if an intelligent agent comes across two documents who both claim to have the same key, the agent simply uses the key to look up which document's the real McCoy.

Let's assume that our document is located at http://www.cs.umd.edu/users/george/--that's the official URL of the document. On George's html page we'd like to add a SHOE instance talking about George (you can have as many instances on a page as you like). So for his key we'll use just his URL: "http://www.cs.umd.edu/users/george/". To declare the instance, we add to the page, <INSTANCE KEY="http://www.cs.umd.edu/users/george/">

Using an Ontology

Next, before we can declare facts about our document, we need to tell the agent which ontology we're using to structure those facts. Without an ontology, agents would have no idea what, say, graduate-student means when we claim that that's what we (and our document) are. We'll use the CS Department ontology we partially described previously. Let's imagine that our ontology is stored at http://www.cs.umd.edu/projects/plus/SHOE/onts/cs.html and it's called cs-dept-ontology version 1.0. To indicate that we're using this particular ontology, we declare:



<USE-ONTOLOGY
          ID="cs-dept-ontology"
          URL="http://www.cs.umd.edu/projects/plus/SHOE/onts/cs.html"
          VERSION="1.0"
          PREFIX="cs">

The PREFIX indicates that all references we make to elements declared in the cs-dept-ontology ontology will be prefixed with a "cs." prefix. We can use as many ontologies as we like, as long as each has a unique prefix.

Categorization

Next, we'll classify or categorize the instance we're declaring on this web page--that is, we'll declare what the instance concerns. In SHOE, categorization is done using the CATEGORY tag in conjunction with one or more categories we've picked from the ontology we're using. In the body of the document, we'll add:



<CATEGORY NAME="cs.GraduateStudent">
<CATEGORY NAME="cs.ResearchAssistant">

This says that this instance belongs to the classes or categories "GraduateStudent" and "ResearchAssistant" as defined in the ontology we've defined to use the "cs." prefix (i.e., cs-dept-ontology).

Declaring Relationships

Next we'd like to tell web robots about relationships to other instances and data. We'll start with the relationships between the instance we're creating and some ordinary data: like our name and age.



<RELATION NAME="cs.name">
        <ARG POS=1 VALUE="http://www.cs.umd.edu/users/george/">
        <ARG POS=2 VALUE="George Stephanopolous">
</RELATION>
<RELATION NAME="cs.age">
        <ARG POS=1 VALUE="http://www.cs.umd.edu/users/george/">
        <ARG POS=2 VALUE="52">
</RELATION>

This tells the robot that we have the relationship "name" as defined in the "cs" ontology with "George Stephanopolous", which is data of the type "STRING" (as defined in the ontology). It also says that we have the relationship "age" as defined in the "cs" ontology with "52", which is data of the type "NUMBER". Since it's highly likely that an instance will refer to itself in a lot of its relationship claims, SHOE provides the handy shortcut "me" to refer to the instance making the claim, as such:



<RELATION NAME="cs.name">
        <ARG POS=1 VALUE="me">
        <ARG POS=2 VALUE="George Stephanopolous">
</RELATION>
<RELATION NAME="cs.age">
        <ARG POS=1 VALUE="me">
        <ARG POS=2 VALUE="52">
</RELATION>

Further, if the relation is binary (that is, it has only two argument positions), you can use "FROM" and "TO" instead of "1" and "2". The advantage of using "FROM" and "TO" is that SHOE then permits you to omit either the "FROM" or the "TO" argument if its value is the instance making the claim, which makes simple attribute claims even shorter. This is not permitted when you're using "1", "2", "3", etc. For example:



<RELATION NAME="cs.name">
        <ARG POS=TO VALUE="George Stephanopolous">
</RELATION>
<RELATION NAME="cs.age">
        <ARG POS=TO VALUE="52">
</RELATION>

You can't mix and match these two ways of writing arguments (for example, POS=1 for one argument and POS=TO for the other).

Relationships don't have to be just between your instance and simple data; it's common to declare relationships with other instances, often on other web pages. For example, if we wanted to say that our graduate advisor was "John Smith", and John Smith declared an instance on his home page with a key of "http://www.cs.umd.edu/users/smith", we might say something like



<RELATION NAME="cs.advisor">
        <ARG POS=TO VALUE="http://www.cs.umd.edu/users/smith">
</RELATION>

Now, a little hypothetical situation: if there was no such thing as "cs.advisor" but there was a relationship between students and professors called, say, "cs.advisorOf" where the relationship pointed in the other direction (that is, between professors and students), then we could use this also, as in:



<RELATION NAME="cs.advisorOf">
        <ARG POS=FROM VALUE="http://www.cs.umd.edu/users/smith">
</RELATION>

This is because the relation "advisorOf" is declared FROM professors TO students.

Some Relationship and Categorization Issues

This assumes that John Smith's home page uses HTML (instead of, say, PostScript), that it uses SHOE, that it declares http://www.cs.umd.edu/smith to be the key of the instance that represents him on his web page, and that John Smith has declared himself to be a Professor (in our ontology, advisor relationship is between Student and Professor). That's a lot of assumptions.

What if John Smith for some reason never actually declared an instance for himself on his home page? This could happen if his web page doesn't use SHOE, or doesn't use HTML, etc. In this case, the best we can do is point to his web page and let robots assume that the "key" is the URL of his web page as we've described. This isn't optimal because other people might do the same thing, but use different URLs; an agent might have a difficult time realizing they're pointing to the same thing. But it's the best we can do.

The other problem that crops up is: What if John Smith never declared himself to be a Professor? Well, we can make that claim for him. Agents would take our claim with a grain of salt (after all, he's not saying it), but at least it would help them understand in what context we're claiming that he's our advisor. So, we could say:



<CATEGORY NAME="cs.professor"
        FOR="http://www.cs.umd.edu/users/smith">

We don't have to categorize everyone we describe relationships with; this just shows that it's possible to describe relationship and categorizations that have nothing to do with our own instance.

Category declarations are similar to relationships in that the FOR is optional, just as arguments with FROM/TO were optional. If the FOR is missing, it's assumed to be the instance making the claim. So our original categorization of ourselves could have been either of:



<CATEGORY "cs.GraduateStudent"
          FOR="http://www.cs.umd.edu/users/george">
<CATEGORY "cs.ResearchAssistant"
          FOR="me">

Make sense?

Relationships with non-SHOE Documents

Now let's describe that paper we authored. The paper is a PostScript file, which as mentioned above can't use SHOE (of course) and therefore has never declared a unique key for itself. The best we can do is use a URL as its key. The publicationAuthor relationship is between publications and people, so we'll need to use a reverse relationship:



<RELATION NAME="publicationAuthor">
        <ARG POS=FROM VALUE="http://www.cs.umd.edu/papers/paper.ps">
</RELATION>

A Nested Instance

Finally, poor Brun Hilda, who only exists World-Wide-Web-wise as a mention on our web page, should get an instance all her own so we can declare facts about her. Brun Hilda will be sharing space on George's web page, so her instance needs a URL different from his instance, but also based on his URL. In SHOE, the accepted protocol for declaring keys for such "subordinate" instances is to use the basic URL, plus a hash mark ( # ), and then some small suffix that distinguishes her from other instances in our document. Let's pick http://www.cs.umd.edu/users/george/#BRUNHILDA as her unique key.

Instances begin with <INSTANCE ...> and end with </INSTANCE>. Notice that we've not declared George's </INSTANCE> tag yet, and yet we're ready to declare Brunhilda. If we're done with George, we can close his instance with </INSTANCE> and then start up the new Brunhilda instance. Or we can nest brunhilda inside George, that is, declare Brunhilda inside George's instance tags. It really doesn't matter much which way you do it, though nesting suggests that George's instance is in some sense "the parent of" or "in charge of" Brunhilda's instance. For the heck of it, let's nest. Now we can declare her instance. and use the techniques we discussed above to make claims about her inside this declaration...that her name is Brun Hilda, that she's a lecturer, and that she's 23, with something like:



<INSTANCE KEY="http://www.cs.umd.edu/users/george/#BRUNHILDA">

<CATEGORY NAME="cs.Lecturer">

<RELATION NAME= "cs.name">
        <ARG POS=TO VALUE="Brun Hilda">
</RELATION>
<RELATION NAME="cs.age">
        <ARG POS=TO VALUE="23">
</RELATION>

</INSTANCE>

Robots interpret any references within an INSTANCE declaration as pertaining to that declaration and not to an outside nesting instance. So the CATEGORY tag doesn't have to have an accompanying FOR tag indicating Brun Hilda.

Finishing Up

To indicate that we're done with George, we finish with



</INSTANCE>

Claims versus Facts

Of course, if Brun Hilda got her own web page and declared an instance elsewhere, then our instance isn't of much use any more (it's her instance on her web page that's really the "Brun Hilda" instance. Nonetheless, we can still make claims about her (even false claims), like claims that she's a research group, or that she's 100 years old! It's important to realize that people can make whatever claims they want--that they're married to Madonna, or are the King of Spain, etc. Hence agents don't interpret claims as facts of knowledge, but as claims being made by a particular instance about itself or about other instances or data. This helps agents weed through the "likely true" claims and the "probably false" claims. In a distributed knowledge mechanisms like the World Wide Web, there's little getting around this; agents have no control over who makes what claims out there.

The Finished Product

So we're done marking up our web page. Let's see what it might look like:




<HTML>
<HEAD>
<META HTTP-EQUIV="SHOE" CONTENT="VERSION=1.0">
<TITLE> My Page </TITLE>
</HEAD>
<BODY>
<P> Hi, this is my web page.
    I am a graduate student and a research assistant.
<P> Also, I'm 52 years old.
<P> My name is George Stephanopolous.
<P> Here is a pointer to my <A 
  HREF="http://www.cs.umd.edu/smith"> graduate advisor.</A>
<P> And <A HREF="http://www.cs.umd.edu/papers/paper.ps">
  is a paper I recently wrote.
<h3> Brun Hilda </h3>
Brun Hilda is a visiting lecturer here from Germany who doesn't have her 
own web page.  However, because I am such a nice person, I have agreed 
to let part of my web page space belong to her.  She is 23.  

<INSTANCE KEY="http://www.cs.umd.edu/users/george/">

        <USE-ONTOLOGY
                  ID="cs-dept-ontology"
                  URL="http://www.cs.umd.edu/projects/plus/SHOE/onts/cs.html"
                  VERSION="1.0"
                  PREFIX="cs">

        <CATEGORY NAME="cs.GraduateStudent">
        <CATEGORY NAME="cs.ResearchAssistant">

        <RELATION NAME="cs.name">
                <ARG POS=TO VALUE="George Stephanopolous">
        </RELATION>
        <RELATION NAME="cs.age">
                <ARG POS=TO VALUE="52">
        </RELATION>
        <RELATION NAME="cs.advisor">
                <ARG POS=TO VALUE="http://www.cs.umd.edu/users/smith">
        </RELATION>

        <INSTANCE KEY="http://www.cs.umd.edu/users/george/#BRUNHILDA">

                <CATEGORY NAME="cs.Lecturer">

                <RELATION NAME= "cs.name">
                        <ARG POS=TO VALUE="Brun Hilda">
                </RELATION>
                <RELATION NAME="cs.age">
                        <ARG POS=TO VALUE="23">
                </RELATION>

        </INSTANCE>
</INSTANCE>

</BODY>
</HTML>

Web Accessibility