  I’ve lost a lot of arguments when challenged with the “It’s just semantics” line. But what is semantics supposed to mean? Oh, now I get it – semantics is meaning!! For us CMS groupies, semantics really means that when we name a unit of content within our model, we give it a name that has meaning, This makes it easy for authors to identify exactly what content they should include. One of our teammates, Steven C, has done the digging on semantics… “Semantic Information” is not a techno-ubiquitous term. Searches for “semantics” and “semantic information” on three tech sites (techdictionary.com, Cnet.com, and techtutorials.com) all generated zero responses.
So, I turned to several print publications for clear definitions. “Semantic” refers to “meaning.” Likewise, in technological jargon, “semantic information” or “semantics” always relates somehow to the meaning of a word, phrase, symbol, or instruction. I am in no way a technophile, but there seem to me to be at least three uses of the term. The first use has to do with programming instruction; the second relates to search engine methodology; the third reflects the way in which we’ll use the term in this course – semantics refers to the strategy used in naming content units in a unified content strategy. First, let’s see a basic definition and how the term applies to programming instructions. The 1996 Dictionary of Computing generically defines semantics as “that part of the definition of a language concerned with specifying the meaning or effect of a text that is constructed according to the syntax rules of the language.” The 1996 Random House Personal Computer Dictionary takes the definition a bit further, injecting the element of programming instruction: “In linguistics, [semantics is] the study of meanings.
In computer science, the term is frequently used to differentiate the meaning of an instruction from its format. The format, which covers the spelling of language components and the rules controlling how components are combined, is called the language’s syntax.” The connection of semantics to programming instructions/language bridged into the 21st century (although this use may be archaic in 2004). The 2000 Encyclopedia of Computer Science uses several pages explaining the detailed instructional codes it refers to as “Programming Language Semantics.” This is, to me, an incomprehensible algebraic-looking code of Greek letters, subscripts, and arrows, all apparently designed to tell the computer what to do when it encounters various words and/or phrases. The 2001 Dictionary of Computer Science, Engineering, and Technology continues the line of thought, defining semantics as “the meaning of a string or sequence of toke symbols in some language, as opposed to syntax which describes how symbols may be combined independent of their meaning. The semantics of a programming language are a transformation from programs to answers.” Let’s turn to the use of semantics in search engine methodology. Azeem Azhar, a U.K. writer, analyst and consultant on technology and society, published an enlightening article (4/25/03) discussing a Google acquisition.
Forgive the lengthy quote, but Azhar explains well the importance of word meaning (semantics) in search methodology: Google has purchased privately-held information retrieval company, Applied Semantics, a firm dotcom-formerly known as Oingo. It is a big deal because it brings together two competing schools, of which more below. Information retrieval is the core of all search businesses. It is about creating software that solves a hard question: getting computers to understand human language with all its vagaries. These vagaries include: · polysemy (words with multiple meanings like DRIVE or SET) · synonymy (different words with similar meanings like AIRPLANE and AIRCRAFT) · multi-word expressions which need to be treated as such (BILL CLINTON) · errors, typos and poor grammar For example, a key word search engine would find it hard to distinguish between A RED FISH and A FISH IN THE RED SEA Broadly speaking there have been two major schools of thought. The first is one I call the statistical school and the second is the semantic.
The statistical school held that context could be determined by look at statistical patterns within documents and across documents in a collection. Essentially, they use a variety of techniques to recognise word co-occurrence. So when words like DRIVE, CAR and HIGHWAY are used together frequently, we can make assumptions about the context of those words. The other approach is the semantic approach. Here knowledge engineers build up a complex network of relationships, an ontology, that relates words together. So a CAR is defined as a type of VEHICLE and identical to the word AUTOMOBILE.
A search on the word CAR will also turn up documents with the word AUTOMOBILE in it, even if they don’t mention it. Such semantic networks require a good deal of work and a lot of maintenance to keep them up to date. So why is this relevant to Google and Applied Semantics? Well Google comes from the statistical school of information retrieval, albeit in a very light way. Currently, Google queries are not parsed very much at all. No stemming is applied, although some word proximity algorithms are used.
Google does take advantage of one unique aspect of the Web: the interconnections between documents which provide a context-weighting to documents based on their link popularity. Applied Semantics will add a layer of semantic understanding that Google needs. There AdSense technology is already being used by a raft of web sites to improve targeting based on user-behaviour. Finally, we turn to how we’ll use the term “semantic information” in designing unified content strategy. In “Managing Enterprise Content” (2003), Rockley defines semantic information as, “a component of an information model; uniquely identifies the content of that element, making it easy for authors to identify exactly what content they should include. Semantic information also enables the identification and reuse of specific content.” Used this way (and compared to other meanings), “semantic information” is actually a straightforward idea.
It simply means that as we identify tags that link to content units, we don’t use generic names (e.g. ). Instead, we use names that have meaning (e.g. ). According to Rockley (and logically so), these semantic tags are of great benefit to authors engaged in opportunistic reuse of content units. Rockley differentiates semantic tags from metadata; we’ll explore that in Chapter 12.
In summary, “semantic information” always refers to a word, phrase, symbol, or instruction that is recognized for its meaning, not its form. In the realm of a unified content strategy, “semantic information” means that we tag content units with meaningful names. Sources: Dictionary of Computing (4th Ed.). Oxford: Oxford University Press, 1996. Margolis, Philip E. Random House Personal Computer Dictionary. New York: Random House, Inc., 1996.
Ralston, Anthony, Edwin D. Reilly, and David Hemmendiner, eds. Encyclopedia of Computer Science (4th Ed.). New York: Grove’s Dictionaries, Inc., 2000. Laplante, Phillip, ed. Dictionary of Computer Science, Engineering, and Technology. Boca Raton, FL: CRC Press, 2001.
Azhar, Azeem. technology, economics, stuff. Website archive 25 Apr. 03, accessed 21 Jun. 04 . Rockley, Ann.
Managing Enterprise Content: A Unified Content Strategy. Indianapolis, IN: Newriders, 2003. 
