|| Home | Resume | Career Review | Articles | Books | Videos | Lectures |
Preliminary Inventory | Photos | In the Press | Reflections | Site Map | About ||


Microsoft Word 98 version of the document is also available for download. paper.doc (89k)


Information and Design Issues for Web-based Archives

 

INTRODUCTION

The Internet and the World Wide Web are commonly discussed as a new medium of communication for marketing, advertising and sharing information. "The abundance of information on the World Wide Web has thrilled some, but frightened others" (Shneiderman 1997). The functions of a web based archive is to arrange, describe, and preserve documentation on the Internet. Its purpose is to provide reference and assistance to future users. Issues to consider during designing and developing an archival web site begin with the same set as any World Wide Web site design. That is largely a matter of balancing the structure and relationship of menus and contents. The goal is to build a hierarchy of menus and pages that is not misleading as well as natural to the user (Lynch, 1995). In addition, archives need to be usable over time. Therefore an archival web site will have to be persistent in its content and its technology.

In this paper, we will analyze some of the general issues to organize and present information-abundant web sites. In the process we will examine organization methods, user domains and the Objects/Actions Interface model (Shneiderman 1997) in the design process. We will also look at different approaches to preserve electronic records (Bearman 1999). In addition, we will examine the existing archival web sites based on the design principles. The rest of the paper will focus on the steps and development of the "Papers of Dr. Ben Shneiderman" web site.

GENERAL DESIGN PRINCIPLES

An important concept in designing web sites is to use consistent layouts, graphics and organizations. These should give users a sense of where they are within the site. Menus, icons, overviews can give users confidence that they can find what they are looking for rapidly. They should be presented consistently on every page within the domain. "The goal is to be consistent and predictable." (Yale 1998)

A popular issue of web structure organization is the depth vs. breadth issue. Student researchers in University of Maryland conducted experiment in a computer science class. The researchers hypothesized that speed and accuracy are directly proportional to the depth of the tree structure of the links of the web page. They also believed that the speed and accuracy would decrease as the breadth of the tree structure increases and depth decrease. In addition, the maximum breadth is reached once the breadth is so large that the page looks unorganized, or too much scrolling is needed to read the entire page (SHORE 1997). After the experiment, the authors concluded that the experiment proved their hypothesis; that mean task time in accessing specific information is proportional to the depth of the tree structure. It also showed that the users prefer a layout with more breadth and less depth. But it failed to prove that increasing errors occur with deeper structures (SHORE 1997). Indeed a very long page with no links is appealing only if users are expected to read the entire text sequentially. But this is rarely the case, so some form of index page to outline the fragments is necessary. The goal is to provide a meaningful structure to guide the users to the fragments, but excessive fragmentation disrupts the purpose (Shneiderman 1997).

Contents within a page are often vertically arranged. The contents should be compactly arranged to minimize scrolling. One common mistake is to use excessive horizontal rules or blank lines to separate items. Designers need to carefully consider the sequencing, clustering and emphasis for objects (Shneiderman 1997). Important items are expected to be in the begining of the pages. Such items can be also be emphasized with color, fonts or graphics.

A well-designed system should be able to accommodate a range of user skills and interests. Diverse users should always be considered in the web design process. It is important to identify the audiences and their purposes. Motives may range from one minute browsing to fact-finding, professional to casual or serious to playful (Shneiderman 1997). Users’ tasks can be organized into four categories: specific fact-finding (known item search), extended fact-finding, open-ended browsing and exploration of availability (Shneiderman 1997). For open-ended browsing users, designer should provide an overview of the site for them to understand the range of services and know what is and is not available. They benefit from hierarchical maps, and design graphics and icons that help trigger where information is stored. Frequent and fact-finding users need familiar landmarks, reversibility and safety during exploration as well as shortcuts (Kellogg & Richards, 1995)

Object/Action Interaction Model

Designing a web page has similarities with designing any other user interface. Therefore The Objects/Actions Interface model (Shneiderman 1998) can by applied as a basis for design. The task of information seeking can be described by hierarchies of task objects and actions. "Then the designer can represent the task objects and actions with hierarchies of interface objects and actions"(Shneiderman 1998). The goal is for designers to choose the most effective metaphors and create visual representations that allow users to decompose their action plan into a series of detailed clicks or keystrokes (Shneiderman 1997). The Objects/Actions Interface model is further discussed in the construction of the Library of Congress web site in the next section.

Other Issues

Other common issues include the use of graphics, screen display, and bandwidth. With the numerous tools for creating complex graphics nowadays, web sites have the potential of being excessively elaborate. However, it is essential for a web site to be graphically pleasing, yet simple. Elaborate graphics are not what make a web page marketable (Han 1998).

Designers must also accommodate small and large displays, monochrome and color. For example, if a web site is developed on a hi-resolution display, the designer should also test it on a low resolution screen to see how much content is off screen. On a low resolution display multiple frames may cause screen clutter and become inefficient. If it is necessary, the designer needs to "take the lowest common denominator" and adjust the page to the low resolution display. Both image use and screen display needs to be considered along with the speed of users’ connections. For instance, users with small screens and low bandwidth connections will mostly perform more rapidly and prefer a largely text-based web site.

Document persistency

In addition to the general organizational and design principles of web pages, web developers for archival sites will also have to consider the future of their web sites. The purpose of an archive is to gather and organize large amounts of information for future retrieval. Therefore it is extremely important to consider the question of "will my documents be readable in the next 20 years". Though it is interesting to try novel technologies such as Java and CGI, but designers need to keep in mind how long such technology will persist.

David Bearman (1999) from Archives & Museum Informatics published an article on "Reality and Chimeras in the Preservation of Electronic Records". Bearman discussed several approaches to preserve electronic records :

Search

When the contents of a web site are enormous, such as an archive, including some kind of searching mechanism is crucial. Searching can range from simple structured pages that allow the use of the "find" command of browsers to complicated database driven backends. Despite the structure of the search engine, the search process should be "…visible, comprehensible and controllable by users" (Shneiderman 1997). A proposed four-phase framework could inspire designers to create more usable search interfaces. The phases are formulation, initiating action, review of results and refinement (Shneiderman 1997).

Comparison of existing web sites

Several pioneers in archival web page design are worth while to look at. It is important to examine their strategies and guidelines. Adapting their strengths and realizing their flaws are the keys to developing a successful web site.

"Guide to the Papers of Katherine Anne Porter" (http://www.lib.umd.edu/UMCP/ARCV/kap/kaptofc.html) is a web site currently under construction by the University of Maryland Libraries. It contains materials dated from 1842 to 1980 that Miss Porter created, received, or collected during her lifetime (1890-1980). The materials include correspondence; manuscripts and drafts of both published and unpublished literary works, notes, and research materials; items relating to Miss Porter's awards, her organizational interests, and other documents stored on different media. (UML 1999).

The web site does not have a finding aid on any of the sections, but it provides a table of contents labeled as "Scope and Content". The "Scope and Content" section simply provides a listing of what is available in the collection. It is extremely useful for the "one minute surfers"-people who just want to quickly browse to see if their target material is available. Each series is then described thoroughly in the "Series Descriptions" section. For fact finding researchers, the "Reel/Box Inventory" section provides every article title in the inventory organized by series and box number. It allows the serious users to hunt down the exact article they are looking for and decide weather it is necessary to get the actual text from the library.

The Jaskson Davis collections of African American educational photographs at University of Virginia library (http://www.lib.virginia.edu/speccol/jdavis/) contains Approximately 4,000 photographs of African-American educational scenes (schools, colleges, training institutes, fairs and the like) from southern United States taken by Jackson Davis during the period 1915-1930. The website includes a biography of Davis, collection of family photos and a link for the elementary school he attained. The main photo archive is organized with a search engine. Users can search by keywords, month, year, negative number, city/state or country of the photos. The query previews generated by the search include thumbnails of the photograph and a brief description. The lack of browsing capability is inconvenient to some users. For example, users who are interested in browsing what is in the collection often are not aware of what keywords or film negative number to search for.

The Jackson Davis collections took a step in providing not only titles or guides but actual contents on an archival web site. But it only contains a limited number of photos ( approx. 5038 images). The HELIOS project at Carnegie Mellon (http://heinz1.library.cmu.edu/HELIOS/), however, introduced a new standard of archival web site design by providing full document contents online.

In 1994, Carnegie Mellon University Libraries embarked on an ambitious project to convert Congressional papers of U.S. Senator John Heinz into digital format. Named in the senator's memory, the Heinz Electronic Library Interactive Online System (HELIOS) includes over 618,000 images of scanned documents from the Senator Heinz collection. The main goal for HELIOS is to provide access for researchers. Since HELIOS is available on the Internet, the collection is accessible to many people simultaneously from locations across the world at all hours (HELIOS 1999). The HELIOS designers consider its primary audience to be university students and faculty members, scholars and public-policy professionals, and extend the reach of primary source materials to new users such as legislative assistants, campaign workers, interest group members, and high school students (HELIOS 1999).

In addition to a powerful search tool, HELIOS provides a browse capability, which allows users to browse through the collection based on a hierarchical series/sub series structure.

The Library of Congress page (http://www.loc.gov) contains hundreds of collections whose items may include searchable documents, scanned page images and digitized photographs, videos, sound or other media. The OAI model can be applied as the guideline for its design. The task objects are the set of catalog items that contained fields about each item. The task actions are to search the catalog, browse the result list and view detailed items. The interface objects are a search form, result lists, brief catalog items and detailed catalog items (Shneiderman 1997). The Library of Congress page allows user to search and browse the collections with control. For example, the George Washington Papers collection (http://memory.loc.gov/ammem/gwhtml/gwhome.html), which contains 65,000 documents and 147,000 images, lets users browse according to timeline, category, subjects and series. The search feature allows users to specify search options and number of items to return.

There are several other archives hosted by various libraries and museums. For example, Harry S. Truman Library and Museum (http://www.trumanlibrary.org/) maintains an electronic archive of President Truman’s documents. The site contains a biography, Presidential documents, photos and more. Searches on the site can be done based on keywords, catalog or timeline. Browsing features are organized by document name, then divided into folders.

Some other existing archival web pages which ranging from private documents to public collections are compared in the Appendix. These sites include: Julia Morgan Collection (http://www.lib.calpoly.edu/spec_coll/morgan/index.html), Seattle Municipal Archive (http://www.ci.seattle.wa.us/seattle/leg/clerk/archhome.htm), Manuscripts and Archives at Yale University (http://www.library.yale.edu/mssa/home1.htm), Oberlin College Archives (http://www.oberlin.edu/~archive/OCA_holdings.html), and Albert A. Schaal archive at New Hampshire Division of Records Management and Archives (http://www.state.nh.us/state/schaalin.htm).

CURRENT PROJECT

Many web designers are aware of the principles of creating an effective, appealing and accessible site, but often disregard them in practice. Sometime it is because the purpose of the web site does not require such effort. Sometimes it is because the timeframe does not allow effective application of the principles. And other times designers merely neglect the importance of the principles. During spring of 1999, the web site "Papers of Dr. Ben Shneiderman" was created under the guidance of Dr. Ben Shneiderman. The web site was developed to serve as a guide to the Ben Shneiderman papers on the discipline of human-computer interactions housed at the UML and presents Dr. Shneiderman’s professional career. The papers include final versions and drafts of articles, conference materials, consulting and grant records, personal correspondence, course materials, and clippings from newspapers and magazines showing how they address user interface issues. During the web site creation process, major organizational principles were looked at and alternate designs were analyzed. Due to the time constraint of this semester long project, many reasonable features were not developed. The remainder of this document focuses on the developments and elements of the web site as well as suggestions for future developments.

Initial Development

During pre-production research, we gathered information from different sources including University Maryland Libraries archives, publications on HCIL, past papers and lecture notes stored on disk. The contents of the files were reformatted including converting file formats, editing the files to preserve consistency with each other and finally saved into HTML files. Next step was to create a "rough draft" of the site. The basic idea was to put all the html files on the web without worrying about navigation, content layout, accessibility or major organizational issues. Then the files were grouped according to their content and the web site was initially divided into 6 sections. The sections were later used as navigation menu items. In parallel with the organizing process, graphics and page layout issues were addressed to improve appearance and accessibility.

Once the major sections were formed and a structure was apparent, a template was developed. The template included a header on top of each page, indicating the title of the web site and the current section. Navigation menus were created based on the sections, placed on the top and the bottom of the pages. A footer area carrying the University of Maryland logo and department information was added as well. As the development carried on, we added several other menu items including, video, photos, books, etc. For long content documents such as the lecture notes and the UML collection, we created a set of secondary indexes anchored to the corresponding sections in the content.

When we finished the initial stages of design, we chose the organization method using the available tools. Several tools are useful in creating a web page containing archives.

Available Tools

EAD

The Library of Congress is one of the most information abundant web sites containing enormous numbers of archives and manuscripts. It has also been active in developing archival searching aids using EAD (Encoded Archival Description). EAD is a document type definition (DTD) of SGML. As a potential international standard, EAD and SGML accommodates more complex formatting and navigation than HTML. Through the use of document tags, it describes, controls, and provides access to other information. The down side of EAD include the time and effort to learn the new mark up language definitions, and the fact that finding aids need to be viewed with a SGML viewer or plug-in installed. The Library of Congress currently converts SGML results to HTML for people not using SGML-aware browsers. Because of the time frame and the difficulty in converting all documents to EAD standards, we did not chose it at this stage of development for the "Papers of Dr. Ben Shneiderman" web site.

 

Java & Javascript and Life lines

Java is promising technology, but it is still fairly new and is not fully supported by many browsers, and in addition, many users turn Java off in their Java capable browsers for speed, security, or reliability concerns. Similarly, Javascript is still changing, and different browsers support different versions of it and sometimes those versions are incompatible. Due to security concerns and features of Javascript which sometimes annoy users (like popup windows), many web users turn Javascript off in browsers which support it.

Life Lines is a Java based application developed in the HCIL lab in University of Maryland to visually present computerized medical records. It organizes large amount of records by using features such as zooming, filtering and color coding. Contents in an archive such as the Papers of Dr. Ben Shneiderman can also make use of the browsing methods and data structure. For instance, the content can be filtered by time, title, content, etc to allow faster access. But as mentioned before, Java technology is still new and has an uncertain future. For an archival web page, we need to make sure that the technology is going to last for years to come. For that reason, we decided to create the "Papers of Dr. Ben Shneiderman" web site using only standard HTML.

Current Solution

All contents of categories such as lectures and articles are placed on one single HTML file. Then the secondary index systems are linked to their contents on the page with anchors. Mid way during the development stage, we encountered a problem. Since the length of some of the documents is extensive, anchoring to the middle of a page will mislead users as to where they are. It also confuses users as to how to go back. But adding a back button for every anchor target may cause even more confusion and disrupt the page appearance. After careful consideration, we decided to introduce frames into the lengthy documents.

Frames were not considered initially because they are poorly implemented in old browsers and problematical as screen resolutions vary, users are often confused or annoyed by the presence of frames. Frames also can cause bookmarking difficulties and back button confusions. Despite the disadvantages, frames allow designers to show multiple pages at once, and effectively display menu systems. Frames keep a menu page visible at all times, while users scroll in the content, providing an overview of the structure. We incorporated frames in the lecture listing, the resume and the Preliminary UML collection pages. The main menu and the secondary indexes were separated from the contents of the page; each was put into separate frames. Finally frame borders were hidden to make the pages look frameless.

Difficulties

During the development of this web site, several difficulties arise. We were able to overcome some of them, while leaving the rest for future improvements.

Most of the original documents were well organized chronologically. But due to university policies, Dr. Shneiderman’s lecture records were organized according to academic years instead of fiscal years. To ensure the consistency of all the pages on this site, those documents had to be broken down and rearranged before publication. In addition, the original lecture notes were chronologically ordered from early documents to most recent. This was inconsistent with the organization of the rest of the pages. To correct this problem, it will be necessary to go through the entire document, reverse the order for every entry, or write a short program to manipulate the pages. Since the limited timeframe is an issue in this project, that problem is left untouched as of today.

All of the HTML pages were composed with Adobe PageMill 3.0 and Microsoft Frontpage 98. Both are excellent tools in providing direct manipulation and drag & drop features in creating HTML documents. Unfortunately, some of the HTML pages on "Papers of Dr. Ben Shneiderman" site were much longer than average documents (i.e. umlbody.html is about 183KB). Both PageMill and Frontpage responded by cutting off some of the pages at the bottom in editing mode. Therefore to edit those files, a word processor was used to directly edit the HTML code. This required extra effort in editing the pages to change fonts, colors, and layouts. Search and replace features in the word processor were frequently used to reduce some of the repetitive editing.

Acknowledgements

During the development of this web site, we worked closely with the University of Maryland Libraries Archive department. We would like to thank Mr. Tim Mahoney from the Archive department for providing the preliminary inventory documentation, giving comments and feedback on the web page and giving a guided tour of the library archive.

Also special thanks to Dr. Ben Shneiderman for reviewing and critiquing the web creation and research paper progress. His guidance helped the development in the right direction.

 

FUTURE DEVELOPMENTS

The purpose of this web site development project was to create a basic foundation of layouts for the "Papers of Dr. Ben Shneiderman". There are lots of features and errors to be implemented and corrected in future developments.

Including a Search Tool

Conceptually, having a search tool is extremely useful for web sites that involve archives or information abundant in general. Unfortunately implementing such feature will require web based programming and converting pages into data structures. It will involve using languages other than html such as EAD or Java. The time frame of this semester long project did not allow for implementation of search. But eventually, it should be included in this web site.

Reverse ordering and consistency

As mentioned before, some documents are not consistent with others in content. An example would be the forward chronological ordering of the Lectures section vs. the reverse chronological ordering in Articles section. Depend on the intention of the users, each method can offer its own advantages. But despite of the fact that either method can be identified as the right way to organize the document, a consistent format should be used to organize the contents through out the web site.

Updating with the library

The library is constantly updating their records. To reflect the most recent version of the collections, the web site needs to be updated according to the library. For example, the preliminary inventory of Papers of Dr. Ben Shneiderman will be organized into series during the next library update. In addition, some of the contents that contain confidential and private student information will be taken out and returned to Dr. Shneiderman. These changes will affect the box labeling and the organization of the papers. For the web site to be consistent with the library content, changes need to be made accordingly.

Providing actual content

As of today, it is still troublesome to digitize entire collections and have them available on the web. Only a few web site implementers are willing to spend the time and the money to provide entire documents online. Archival web sites as of today can only serve as a guide for researchers to view what is available. But to obtain the actual collections, researchers still need to physically travel to or contact the destination library. As technology advances, it will be easier and cheaper to store documents electronically. Technologies such as Optical Character Recognition should be used to help publish entire collections online.

CONCLUSION

Organizing information-abundant web sites is a challenging process. It requires considerations of general principles of web page design such as consistent and effective layout, accommadating divers users and screen resolutions, etc. Furthermore, document persistency and search capability are also important in the design process.

The "Papers of Dr. Ben Shneiderman" web project is still in the development stage. In the process of creating this web site, I had the opportunity to look into many of the interesting web design issues, compare existing web sites and study different guidelines. And as an amateur web designer, I was astonished by the number of problems, issues, difficulties and resolutions involved in the process. The project was also a great opportunity for me to work with the members of the University of Maryland Libraries and to see the amazing record keeping process for organizing historical documents. In the future, the Papers of Dr. Ben Shneiderman web site may serve as a template for the University of Maryland Libraries collections. If possible, many of the unfinished features in the "Future Development" section of this paper need to be addressed and implemented.

Research should continue in the area of archival page design since the World Wide Web and the Internet are becoming more of a primary source of information. Many puzzles and problems remain unsolved in this field. Strategies for content organization, character recognition and electronic document preservation are beyond the scope of this paper, but crucial to developments in this area. Finally, web designers and researchers should work together to bring new standards to electronic information exchange.

REFERENCES

Bearman, David. "Reality and Chimeras in the Preservation of Electronic Records." Archives & Museum Informatics: Volume 5 Number 4 ISSN 1082-9873. Internet. Apr. 1999. Available: http://www.dlib.org/dlib/april99/bearman/04bearman.html

Burstein, Cari D. "Accessible Site Design Guide Design Elements." 7 Mar. 1999.< http://www.anybrowser.org/campaign/abdesign2.shtml >

"Depth vs Breadth in the Arrangement Web Links" Student HCI Online Research Experiments. Internet. Jun. 1997. Available: http://www.otal.umd.edu/SHORE/bs04

"Encoded Archival Description Finding aids." EAD Finding Aids at Library of Congress Sep. 1995. < http://lcweb.loc.gov/rr/ead/eadhome.html>.

Galloway, Edward A. "Heinz Electronic Library Interactive Online System (HELIOS)" n. pag. Online. Internet. 28 Apr. 1999. Available: http://heinz1.library.cmu.edu/HELIOS

Lynch, Patrick and Horton, Sarah. "Web Style Guide: Basic Design Principles for Creating Web Sites." Yale University Press 1997. <http://info.med.yale.edu/caim/manual/>

Shneiderman, Ben. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Berkeley: Addison-Wesley, 1998.

 

|| Home | Resume | Career Review | Articles | Books | Videos | Lectures |
Preliminary Inventory | Photos | In the Press | Reflections | Site Map | About ||


Web Accessibility