The International Children’s Digital Library:
A Case Study in Designing for a Multi-Lingual, Multi-Cultural,
Hilary Browne Hutchinson1,2, Anne Rose1, Benjamin B. Bederson1,2,
Human-Computer Interaction Laboratory
1Institute for Advanced Computer Studies
2Department of Computer Science
3College of Information Studies
We describe the challenges encountered in building the International Children’s Digital Library, a freely available online library of children’s literature. These challenges include selecting and processing books from different countries, handling and presenting multiple languages simultaneously, and addressing cultural differences. Unlike other digital libraries that present content from one or a few languages and cultures, and focus on either adult or child audiences, the ICDL must serve a multi-lingual, multi-cultural, multi-generational audience. We present our research as a case study for addressing these design criteria and describe our current solutions and plans for future work.
The Internet is a multi-lingual,
multi-cultural, multi-generational environment. While once the domain of English-speaking,
Western, adult males, the demographics of the Internet have changed remarkably
over the last decade. As of March 2004, English was the native language of only
36% of the total world online population (Global Reach, 2004). As of May 2004,
Creators of online digital libraries have recognized the benefit of making their content available to users around the world, not only for the obvious benefits of broader dissemination of information and cultural awareness, but also as tools for empowerment and strengthening community (Witten et al., 2001; Downie, 2003). Creating digital libraries for children has also become a popular research topic as more children access the Internet (Busey and Doerr, 1993; Külper et al., 1997; Druin et al., 2001). The International Children’s Digital Library project (Druin, in press) (www.icdlbooks.org) seeks to combine these areas of research to address the needs of both international and intergenerational users.
Background and Related Work
Creating international software is a complex process involving two steps: internationalization, where the core functionality of the software is separated from localized interface details, and localization, where the interface is customized for a particular audience (Marcus, 2002). The localization step is not simply a matter of language translation, but involves technical, national, and cultural aspects of the software (Fernandes, 1995). Technical details such as different operating systems, fonts, and file formats must be accommodated. National differences in language, punctuation, number formats, and text direction must be handled properly. Finally, and perhaps most challenging, cultural differences must be addressed.
Hofstede defines culture as “the collective mental programming of the mind which distinguishes the members of one group or category of people from another” (1991, p. 5). These groups might be defined by national, regional, ethnic, religious, gender, generation, social class, or occupation differences. By age 10, most children have learned the value system of their culture, and it is very difficult to change. Hofstede breaks culture into four components: values, rituals, heros, and symbols. These components manifest themselves everywhere in software interfaces, from acceptable iconic representations of people, animals, and religious symbols to suitable colors, phrases, jokes, and scientific theories (Fernandes, 1995). However, as Hoft (1999) notes, culture is like an iceberg: only 10% of the characteristics of a culture are visible on the surface. The rest are subjective, unspoken, and unconscious. It is only by evaluating an interface with users from the target culture that designers can understand if their software is acceptable (Nielsen, 1996).
Developers of online digital
libraries have had to contend with international audiences for many years, and
the MARC and OCLC systems have reflected this concern by including capabilities
for transliteration and diacritical characters (accents) in various languages (Borgman, 1997). However, it is only more recently, with the
development of international character set standards and web browsers that
recognize these standards, that truly international digital
libraries have emerged. Greenstone, an open-source software project
Researchers have also realized that beyond accessibility, digital libraries have enormous potential for empowerment and building community, especially in developing countries (Witten et al., 2001). Downie (2003) describes the importance of community involvement when creating a digital library for a particular culture, both to empower users and to make sure the culture is accurately reflected. Even more than accurately reflecting a culture, a digital library also needs to be understood by the culture. Dunker (2002) notes that a digital library interface metaphor based on a traditional brick and mortar library was incomprehensible to the Maori culture in New Zealand, who are not familiar with the conventions of Western libraries.
In addition to international libraries, a number of researchers have focused on creating digital libraries for children. Recognizing that children have difficulty with spelling, reading, and typing (Moore & St. George, 1991; Solomon, 1993), as well as traditional categorization methods such as the Dewey Decimal System (Busey and Doerr, 1993), a number of researchers have created more child-friendly digital libraries. Pejtersen (1989) created the BookHouse interface with a metaphor of rooms in a house to support different types of searching. Külper et al. (1997) designed the Bucherschatz interface for 8-10 year olds using a treasure hunt metaphor. Druin et al. (2001) designed the QueryKids interface for young children to find information about animals and Theng et al. (2001) used the Greenstone software to create an environment for older children to write and share stories.
The International Children’s Digital Library (ICDL) project seeks to build on and combine the research in both international and children’s digital libraries. As a result, the ICDL is more ambitious than other digital library projects in a number of respects. First, we are designing for a broader audience. While the digital libraries described above target one or a few cultures or languages, our audience includes potentially every culture and language in the world. Second, we do not localize our content. Part of the goal of the library is to expose users to books from different cultures, so it would be counterproductive to present books only in a user’s native language. As a result, our interface not only supports multiple languages and cultures, but it also supports them simultaneously, frequently on the same screen. Third, our audience not only includes a broad group of adults from around the world, but also children age 3-13.
To address these challenges, we created a multi-disciplinary, multi-lingual, multi-cultural, and multi-generational team and divided the development into several stages. In the first stage, completed in November, 2002, we created a Java-based, English-only version of the library that addressed the searching and reading needs of children. In the second stage, completed in May, 2003, we developed a more accessible HTML version of the software. In the third stage, completed in May of 2004, we translated the metadata for the books in the library into their native languages and allowed users to view this metadata in the language of their choice. The final stage, currently in progress, involves translating the interface to different languages and adjusting some of the visual design of the interface according to the cultural norms of the language being presented. In this paper, we present our research as a case study, describing the solutions we have implemented to address some of these challenges and our plans for addressing ongoing ones.
ICDL Project Description
The ICDL project was initiated in
2002 by the
The project has two main audiences: children ages 3-13 and the adults who work with them, as well as international scholars who study children’s literature. The project draws together a multi-disciplinary team of researchers from computer science, library science, education, and art backgrounds. The research team is also multi-generational – team members include children ages 7-11 who work with the adult members of the team twice a week during the school year and for 2 weeks in the summer to help design and evaluate software. Using the methods of Cooperative Inquiry (Druin, 1999), including brainstorming, low-tech prototyping, and observational note-taking, the team has researched, designed, and built the library’s category structure, collection goals, and searching and reading interfaces.
The research team is also multi-lingual
and multi-cultural. Adult team members are native or fluent speakers of a
number of languages besides English. We are working with school children and
their teachers and librarians in the
ICDL Interface Description
The design of the ICDL is driven by our audience, which includes users, contributors, and volunteers of all ages from around the world – over half a million unique visitors from nearly 200 countries at last count. As a result, we must collect, process, store, and present books written in many different languages for users of different ages and cultural backgrounds. In the remainder of this paper, we will describe some of the challenges we encountered and are encountering in our development process, including selecting and processing a more diverse collection of books, handling different character sets and fonts, and addressing differences in cultural, religious, social, and political interpretation.
Figure 1: ICDL Basic world browser
Figure 2: ICDL Basic category browser
Figure 3: ICDL Basic standard book reader
Figure 4: ICDL Basic comic book reader
Figure 5: ICDL Basic spiral book reader
Figure 6: ICDL Metadata interface with Spanish metadata
Figure 7: ICDL Metadata interface with Chinese metadata
Book Selection and Processing
The first challenge in the ICDL project is obtaining and managing our content. Collecting books from around the world is a challenge because national libraries, publishers, and authors all have different rules regarding copyrights. Our goal is to identify and obtain award winning children’s books from around the world, for example books on the White Ravens list (International Youth Library, 2004), which we also make available to our users ( These groups help us determine whether books are relevant and acceptable in the culture they are from, and whether they are appropriate for children age 3-13. We have found that these groups are eager to help, and that including them in the process is an effective way to build our project and the community surrounding it.). However, we also receive unsolicited books, frequently in languages we can not read. As a result, we rely on members of our advisory board and various children’s literature organizations in different countries to review these books.
In addition to collecting and scanning books, we also collect bibliographical metadata in the native language of the book (e.g., title, creator(s), publisher, abstract) via our web-based metadata form filled out by the book contributors. We chose to base the ICDL metadata specification on the Dublin Core (Dublin Core Metadata Initiative, 2004) because of its international background, ability to be understood by non-specialists, and the possibilities to extend its basic elements to meet our specific needs (see http://www.icdlbooks.org/metadata/specification for more details). Contributors can optionally translate the metadata they provide to English and/or transliterate it to Latin characters, if necessary. Regardless of what language or languages they provide, we ask that they provide information that they create themselves, such as the abstract, in a format that is easily understandable by children. Simple, short sentences make the information easy for children to read, and easier to translate to other languages.
The metadata provided allows us to catalog the books for browsing according to our various categories and to index the books for keyword searching. Even though translation to English is optional, our English-speaking metadata team needs the metadata in English in order to catalog the books. Since many contributors don’t have the time or ability to provide all of this information, we rely on volunteers who speak different languages to check the metadata that gets submitted, translate it, and/or transliterate it as necessary. This method allows us to collect information from our contributors without overwhelming them and also helps us build and maintain our volunteer community.
Handling Different Character Sets
Our metadata form allows contributors to provide information from the comfort of an operating system and keyboard in their native language, but this flexibility requires software that can handle many different character sets. For example, English uses a Latin character set, Russian uses a Cyrillic character set, and Farsi uses an Arabic character set. Fortunately, there exists a single character set that contains a unique encoding for nearly every character in every language called Unicode, an international, cross-platform standard (Unicode Consortium, 2004). Unfortunately, not all software supports Unicode yet. In the first stage of implementation in the ICDL, we collected metadata information only in English, so Unicode compliance was not a problem. However, when we moved to the next phase of development, which included collecting and presenting metadata in the native language of all of our books, we had to adjust our software to use Unicode because we support potentially every language in the world.
For storage of metadata, we were already using the open source MySQL database, which was recently upgraded to allow storage of Unicode data. Our web applications run on Apache HTTP and Tomcat web servers, both of which are freely available and Unicode compliant. However, we had to internationalize and localize both the website and the database to separate the template for metadata presentation from the content in different languages. For passing information between the database and the website, we also had to change to a Unicode-compliant database driver. Both the Basic and metadata collection applications are written using freely available Java servlet technology. The Java language is Unicode-compliant, but we had to make some adjustments to our servlet code to force it to handle our data using Unicode.
To allow users to conduct keyword searches for books in the Basic interface, we use Apache’s freely available Lucene search engine to create indices of book metadata, which can then be searched. Lucene is Unicode compliant, but we had to create a separate index for each language and require users to select a search language. This requirement was necessary for two reasons. First, we needed to avoid confusion over the same words with different meanings (e.g. ‘bra’ means ‘good’ in Swedish). Second, different languages have different rules for stopwords to ignore (e.g. ‘the’, ‘of’, ‘a’ in English), truncation of similar words (e.g. ‘cats’ has the same root as ‘cat’ in English), and separation of characters (e.g. Chinese does not put white space between symbols). Lucene has text analyzers for a variety of languages that support these different conventions. For languages that Lucene does not support, we had our volunteers translate English stopwords and we created our own simple text analyzers.
Finally, we had to modify the HTML headers created by our Java servlets to indicate that the content being delivered to users’ browsers was in the Unicode character set. Most current browsers and operating systems recognize and handle web pages delivered in Unicode properly. For those that don’t, we created help pages that explain how to configure common browsers to use Unicode and to upgrade older browsers that don’t support Unicode.
By making our systems fully Unicode compliant, contributors from all over the world can enter metadata about books in an easily accessible HTML form using their native language, and the characters are properly transmitted and stored in our database. Our volunteers can then use the same form to translate or transliterate the metadata as necessary. Finally, we can present this information to our users when they look at books. For example the book Where’s the Bear? (Harris, 1997) is written in 5 different languages. Our original metadata came in English, but our volunteers translated it to Italian, Japanese, French, and German. Users looking at the preview page for this book in the library have the opportunity to change the display language of the book to any one of these languages using a pulldown menu (Figures 8 and 9).
Figure 9. Where’s the Bear? in Japanese
Currently, only the book metadata language can be changed, but in the next stage of development, we will translate all of the surrounding interface text (e.g. navigation, labels) to different languages as well. To do this, we plan to take a similar approach to the CITIDEL project by creating a website where volunteers can translate words and phrases from our interface into their native language (Perugini, 2004). Like the creators of CITIDEL, we believe that machine-based translation would not provide good enough results, and we simply don’t have the resources to do the translation ourselves. We also believe that encouraging volunteers to translate the site will help enlarge and enrich the ICDL community.
Character Set Complications
Several issues have arisen as a result of collecting multi-lingual metadata in many character sets. First, different countries use different formats for dates and times, so we allow contributors to specify the calendar used when they enter date information (e.g., Muslim, Julian). Second, not only do different countries use different formats for numbers, the numbers themselves are also different. For example, the Arabic numbers for 1, 2, 3 are ١, ٢, ٣. Even though Java is Unicode compliant, it treats numbers as Latin characters. This means we must store Latin versions of any non-Latin numbers used internally by our software for calculations, such as book page count.
A third issue is that we need some of the metadata, such as author and illustrator names, to be transliterated so their values can be displayed when the metadata is shown in a Latin-based language. Ideally we need the transliteration standards used for a language to be consistent so the same values are always transliterated the same way. Unfortunately, we have found no practical way to enforce this, except to state the standard to be used in our metadata specification. When different standards are used, it makes comparison of equal items much harder. For example, the same Farsi creator has been transliterated as both “Hormoz Riyaahi” and “Hormoz Riahi”. We cannot always assume a person is the same just because the name is the same (e.g., John Smith), but when a name is in a character set that we cannot understand, this problem becomes more challenging.
Finally, we had to handle differences in character set length and direction in our interface. Different languages use different numbers of characters to present the same text. We had to design our screens in such a way that the metadata in languages with longer or shorter representations than the English version would still fit. We anticipate having to make additional interface changes when we translate the remainder of the interface to accommodate longer labels and navigational aids. We also had to consider the fact that while most languages are read left to right, a few are read right to left (e.g. Arabic, Hebrew). As a result, we designed our screens so that book metadata was reasonably presented in either direction. Currently, only the text is displayed left to right, but eventually our goal is to mirror the entire interface to be oriented left to right when content is shown in left to right languages.
While most current browsers and operating systems recognize Unicode characters, whether or not the characters are displayed properly depends on whether users have appropriate fonts installed on their computers. For instance, if a user looks at Where’s the Bear? and chooses to display the metadata in Japanese, he will only see the Japanese metadata if the computer has a font installed that includes Japanese characters. Otherwise, depending on the browser and operating system, he may see question marks, square boxes, or nothing at all instead of the Japanese characters.
The good news is that many users will never face this problem. The interface for the ICDL is presented in English (until we translate it to other languages); we have metadata for nearly all our books in English, which is always presented by default first; and most operating systems come with fonts that can display English characters. Users who choose to display book metadata in another language are likely to do so because they actually can read that language, and therefore are likely to have fonts installed for displaying that language. Furthermore, many commonly used software packages, such as Microsoft Office, come with fonts for many languages. As a result, many users will have fonts installed for more languages than just those required for the native language of their operating system.
Of course, fonts will still be a problem for other users, such as those with new computers that they have not yet configured with different fonts or those using a public machine at a library. These users will need to install fonts so they can view book metadata, and eventually the entire interface, in other languages. To assist these users, we created help pages to assist users with the process of installing a font on various operating systems.
Issues of Interpretation
While technical issues have been a major challenge for the ICDL, we have also encountered a number of non-technical issues relating to interpretation. First, visual icons are crucial for communicating information to young children who can’t read, and to users who don’t speak English until we have translated the interface into different languages. However, certain pictorial representations may not be understood by all cultures, or worse, may offend some cultures. We have already redesigned one icon with a boy sticking out his tongue because we learned this was offensive in Chinese culture. We are in the process of redesigning other icons, such as those for our rating system of stars for popular books. The original icons used 5-sided stars, which are religiously significant, so we are switching to more neutral 7 or 8 sided stars.
As we continue to internationalize our interface, we will likely need to change other icons that are difficult to represent in a culturally neutral way when the interface is displayed in different languages. For instance, it is a real challenge to create icons for categories such as “Mythology” or “Super Heros”, since the symbols and stories for these concepts differ by culture. Icons for categories such as “Funny”, “Happy” and “Sad” are also complicated because certain common American facial and hand representations have different, sometimes offensive, meaning in different cultures. What is considered funny in one culture (e.g. a clown) may not be understood well by another culture. We may have to create different versions of such icons depending on the language and cultural preferences of our users. We rely on our multi-cultural team members, volunteers and advisory board to inform us about these concerns.
We have also encountered religious,
social, and political problems of interpretation. Our collection develops
unevenly as we build relationships with various publishers and libraries. As a
result, we currently have many Arabic books and only one Hebrew book, which has
generated multiple emails from users concerned that we are taking a political
stance on the Arab-Israeli conflict. To address this concern, we are currently
working to develop a more balanced collection. We have received multiple books
Finally, we have received some books with potentially objectionable content. Some of these are historical books that involve presentation of content that is now considered derogatory. Some include subject matter that may be deemed appropriate by some cultures but not by others. Some include information that may be too sophisticated for children age 3-13 in any culture. While we are careful not to include books that are inappropriate for children age 3-13, we do not want to censor books whose content is subjectively offensive. Instead, we check with the contributor to make sure they were aware of our collection development guidelines. If they believe that a book is historically or culturally appropriate, we include the book. We also provide a statement at the bottom of all of our book pages indicating that the books in the library come from diverse cultures and historical periods and may not be appropriate for all users of the library.
Conclusions and Lessons Learned
Designing a digital library for an international, intergenerational audience is a challenging process, but it is hugely rewarding. We are continually amazed with feedback from users all over the world thanking us for making books available from their countries, from teachers who use the library as a resource for lesson planning, from parents who have discovered a new way to read with their children, and from children who are thrilled to discover new favorite books that they can’t get in their local library.
One of the most important lessons we have learned is that an international, intergenerational team is an absolute necessity. Simply having users and testers from other countries is not enough; their input is valuable, but comes too late in the design process to influence major design changes. Team members from different cultural backgrounds offer perspectives that an American-only team simply would not think to consider. Similarly, team members who are children understand how children like to look for and read books, and what interface tools are difficult or easy, and fun or not fun. Our enthusiastic advisors and volunteers are also a crucial resource. We don’t have the time, money, or resources to address all of the issues that surface, and our advisors and volunteers are key resources in our development process.
Beyond the human resources, the technical resources involved in making the ICDL an international environment requires us to examine and adjust our software and interfaces at every level. Unlike many digital libraries that only focus on one or a few languages, the ICDL must be simultaneously multi-lingual, multi-cultural, and multi-generational. As a result, a second lesson we learned is that freely available and open-source technologies are now available for making the necessary infrastructure meet these criteria. With varying degrees of complexity, we were able to get all the pieces to work together properly. The more difficult challenge unfortunately falls on our users, who may need to install new fonts to view metadata in different languages. However, as computer and browser technology advance to reflect more global applications, we expect this problem to lessen and eventually disappear.
The more subjective issue of cultural interpretation has proven to be the most interesting challenge, and one that will likely not disappear as our collection grows and we embark on the next stage of development for translating our interface to support other languages and cultures. The final lesson we have learned is that culture pervades every aspect of both the visual design and the content of our interface, and that we have to examine our own biased cultural assumptions to ensure that we are respectful of others. However, with the enthusiasm we continue to see in our team members, advisors, volunteers, and users, we expect that we will be able to address future design challenges with their help.
The ICDL is a large project with
many people who make it the wonderful resource that it has become. We thank
them all for their continued hard work, as well as our many volunteers and our
generous contributors. We would especially like to thank the National Science
Foundation for our ITR grant, and the
1. Borgman, C. (1997). Multi-Media, Multi-Cultural, and Multi-Lingual Digital Libraries: Or, How Do We Exchange Data in 400 Languages? D-Lib Magazine, June 1997.
2. Busey, P. and Doerr, T. (1993). Kid’s Catalog: An Information Retrieval System for Children. Youth Services in Libraries, 7 (1), pp. 77-84.
3. Downie, J. (2003). Realization of Four Important Principles in Cross-Cultural Digital Library Development. Workshop Paper for JCDL 2003.
4. Druin, A. (1999). Cooperative Inquiry: Developing New Technologies for Children with Children. Proceedings of Human Factors in Computing, pp. 592-599.
5. Druin, A. (In Press). What Children Can Teach Us: Developing Digital Libraries for Children with Children. Library Quarterly.
6. Druin, A., Bederson, B., Hourcade, J., et al. (2001). Designing a Digital Library for Young Children: An Intergenerational Partnership. Proceedings of the Joint Conference on Digital Libraries, pp. 398-405.
7. Dublin Core Metadata Initiative (2004), http://www.dublincore.org
8. Duncker, E. (2002). Cross-Cultural Usability of the Library Metaphor. Proceedings of JCDL '02, pp. 223-230.
Fernandes, T. (1995). Global Interface Design.
10. Global Reach (2004), .
Harris, J. (1997). Where’s the Bear?
Hofstede, G. (1991). Cultures and Organizations: Software of the
Hoft, N. (1998).
Developing a Cultural Model. In
Hourcade, J., Bederson, B., Druin, A., Rose, A.,
Farber, A., & Takayama, Y. (2003).
The International Children's Digital Library: Viewing Digital Books Online. Interacting with Computers, 15, pp. 151-167.
15. International Youth Library (2004). The White Ravens 2004. Available for purchase at http://www.ijb.de/index2.html.
16. Külper, U., Schulz, U., and Will, G. (1997). Bücherschatz – A Prototype of a Children’s OPAC. Information Services and Use, (17), pp. 201-214.
Marcus, A. (2002). Global and Intercultural
User-Interface Design. In Jacko, J. and Sears, A. (Eds), The Human-Computer Interaction Handbook.
18. Moore, P. and St. George, A. (1991). Children as Information Seekers: The Cognitive Demands of Books and Library Systems. School Library Media Quarterly, 19, pp. 161-168.
19. National Telecommunications and Information Administration (NTIA) (2002). A Nation Online: How Americans are Expanding Their Use of the Internet. http://www.ntia.doc.gov/ntiahome/dn/.
J. (1996). International Usability Engineering. In
21. Pejtersen, A. (1989). A Library System for Information Retrieval Based on a Cognitive Task Analysis and Supported by an Icon-Based Interface. ACM Conference on Information Retrieval, ACM Press, pp. 40-47.
22. Perlman, G. (2000). The FirstSearch User Interface Architecture: Universal Access for any User, in many Languages, on any Platform. Proceedings of CUU 2000, pp. 1-8.
23. Perugini, S., McDevitt, K., et al. (2004). Enhancing Usability in CITIDEL: Multimodal, Multilingual, and Interactive Visualization Interfaces. Proceedings of JCDL '04, pp. 315-324.
24. Reuter, K. and Druin, A. (In Press). Bringing Together Children and Books: An Initial Descriptive Study of Children’s Book Searching and Selection Behavior in a Digital Library. Proceedings of American Society for Information Science and Technology Conference.
25. Solomon, P. (1993). Children’s Information Retrieval Behavior: A Case Analysis of an OPAC. Journal of the American Society for Information Science and Technology, 44 (5), pp. 245-264.
26. Theng, Y., Nasir, N., Buchanan, G., et al. (2001). Dynamic Digital Libraries for Children. Proceedings of the Joint Conference on Digital Libraries, pp. 406-415.