| How to obtain a full version of this book | Find out about free shipping offer |
| [Brief table of Contents] [List of all figures] [Next Chapter] | [book index] |
0.1. Foreword, by Jessica Milstead.
Users of information retrieval databases generally have no idea of the complexity of these databases, or of the effort that goes into development of their structure. This is as it should be; the design of the database should not have to be the user.
Once upon a time, in the days when the most sophisticated information retrieval databases were library card catalogs (which many users of this book probably don't remember), the issue of just how much users should be expected to learn in order to satisfy their needs from the database was actually a lively one. Today all that has changed. Willingly or not, designers of information retrieval databases and the access systems that support them have come to recognize that users just want the information they need. The less effort required of the users, the happier they will be, and the more likely they are actually to use the database — and therefore the more likely they are to support its continuation and growth.
There is another side to the question, of course. The effort has to be undertaken somewhere. The easier a given database is to use, the more effort is required behind the scenes to assure that its use will be transparent. And the number of decisions to be made and issues to be resolved in the design of an information retrieval database is certainly not obvious to anyone who has not actually undertaken such a task. It's not just a matter of dumping in some text and citations and turning a search engine loose — at least not if the information is valuable and you desire quality results.
Twenty years ago I wrote a small book (Subject Access Systems: Alternatives in Design, published by Academic Press) that discussed some of the major issues in information retrieval database design, but I didn't even try to cover every aspect. Jim Anderson has now undertaken this daunting task, and in the online environment, which barely existed when I wrote my book. We had online databases, but they were hard to use, requiring the skills of trained searchers to exploit them fully.
Prof. Anderson has distilled his decades of experience in teaching and design of information retrieval databases to produce a work that covers every aspect of design. He spent a number of years as the chair of a National Information Standards Organization committee charged with revising the standard for indexes. His experience with that committee has enriched this, his magnum opus.
Sound textbooks for information retrieval database design have been very few, and none have been as comprehensive as this. I particularly enjoyed the practice of defining some examples, and following them as example cases in every chapter. With these example cases it becomes possible to see how the principles discussed in the chapter would be applied in an actual database.
Prof. Anderson's work will serve a variety of users. It synthesizes much that is known, while bringing to bear its author's insights on issues. The case studies make the work particularly useful as a textbook, but it will also serve as a refresher for those already in the field, and as a reference for all audiences.
0.2. Preface, by James D. Anderson.
● purpose of this book; definition of IR databases : 1
This book is for our students, and for others who aspire to design the best possible information retrieval (IR) databases for every type of clientele and every type of message, in whatever medium or format. The overall objective is maximum effective retrieval of useful messages for each particular user.
● scope of this book : 2
The scope of this book is determined by the features of modern IR databases. The term "information retrieval database" or "IR database" is used in the broadest sense. Increasingly IR databases are designed for and implemented in digital media, but the design principles addressed in this book apply just as much to all media, including print on paper, microfilm or fiche, or even card catalogs. They certainly apply to modern digital libraries. The basic definition for the term "IR database" as used in this book is any database in any medium designed or created for the purpose of discovering and retrieving messages, texts and documents. Thus, it includes the whole gamut of IR databases presented to users via online connections, the world-wide web, CD-ROMs, or in print on paper: indexing and abstracting services (regardless of medium), library catalogs (including OPACs: online public access catalogs), bibliographies, and indexes, including back-of-the-book indexes (which can now be presented electronically with electronic books!). This and related definitions are expanded in the first part of the book.
● organization of this book; fundamental issues of IR database design : 3
This book is organized around twenty fundamental design issues in IR database design. See the table of contents for a summary of these issues. These are issues that I have identified during twenty-five years of investigation, teaching, design, evaluation, and creation of IR databases. These issues were further refined between 1991 and 1997 by Committee YY of the National Information Standards Organization (NISO). This committee, which I chaired, focused primarily on indexing and indexes, which are fundamental components for all IR databases. But in fact, the committee addressed all of the fundamental design issues for IR databases.
● NISO Committee YY: new standard for indexes : 4
NISO had charged Committee YY with developing a new standard for indexes that would cover every type of index:
● NISO technical report on design of indexes : 5
The results of the work of this committee, based on input from the NISO member organizations most concerned about the design of IR databases as well as hundreds of individual information professionals, was published in a technical report: Guidelines for indexes and related information retrieval devices (Anderson 1997a).
● publications related to this book : 6
This NISO technical report covers the scope of this book in a highly compressed format. Another even briefer presentation of the issues covered in this book, but in a less technical and more discursive style, is contained in my encyclopedia article: "Organization of knowledge," in the International encyclopedia of information and library science (Anderson 1997b, revised 2002).
● views of Milstead (Jessica L.) on IR database design : 7
This book rests on the foundations of research and practice in information retrieval, IR database design, indexing and cataloging by persons too numerous to mention. Some of their publications are cited in the bibliography at the end of this book. Here I want to acknowledge one book in particular, which has served as a precursor and exemplar for this book: Subject access systems: alternatives in design by Jessica L. Milstead (1984). Like this book, Milstead's emphasis and purpose were to foster intelligent design of IR databases. Although organized differently, Milstead covers many of the same points as we do, and we hardly ever disagree! I highly recommend her book. It has aged very well!
● views of Bates (Marcia J.) on IR database design : 8
I also want to pay special tribute to the work of Marcia Bates. One of her earliest articles, "Rigorous systematic bibliography" (1976) got me started in my analysis of IR database design principles. In that article, she focused on attributes of scope and documentary domain, and also on explicit criteria for selectivity. Since then, she has been a leader in research and commentary on essential aspects of the design and use of IR databases. She summarizes much of her work, and that of others, in "Indexing and access for digital libraries and the internet: human, database, and domain factors" (1998). Here she discusses such essential issues as human factors, indexing and searching terminology and structures, statistical distribution properties of documents in collections and databases, and subject domain-oriented indexing. In this book, her work has been especially influential in the chapters on scope and domain, vocabulary management, and interface design.
● views of Chowdhury (Gobinda G.) on information retrieval : 9
Most books on information retrieval focus on particular approaches to this broad field, such as human indexing, library cataloging and classification, or automatic indexing. A recent book that adopts the same kind of broad view as this book is G. G. Chowdhury's Introduction to modern information retrieval (1999). The organization and focus of our two books is different, but readers who desire another viewpoint can benefit from Chowdhury's book.
● Pérez-Carballo (José) as co-author: 10
Dr. José Pérez-Carballo, my former colleague in the School of Communication, Information, and Library Studies at Rutgers University, has joined me as co-author of this book. He has helped with the entire book, but he has taken special responsibility for sections on automatic indexing and interface design.
● purpose of this book: 11
I hope that this detailed concentration on the fundamental decision points of IR database design will help members of the information professions to consider all the options, and then to design and create better IR databases. Our society needs the best possible IR databases to cope with the ever growing explosion of information on the internet and the world-wide web, as well as in older print formats, video, film, audio, and electronic formats.
0.3. Acknowledgments, by James D. Anderson.
● acknowledgment to students : 12
Many thanks go to my students who have used and critiqued drafts of this book: Debbie Abrams, Jane Achola, Michael Angeles, Shawn Armington, Robert Barbanell, Frances Berman, Michele Lisoski Bond, Elana Broch, Linda Brown, John Burchard, Dorothea E. Clark, Susan Clark, Lisa Coats, Kathleen Creegan, Thomas M. Dolan, Lisa Ellis, Olga Evanusa-Rowland, Loisann Griglak, Ted E. Hamer, Lonnie Johnson, Mary Kearns-Kaplan, Richard K. Kearney, Michael Knies, Scott Kushner, Mariann E. Lucas, Marygrace Luderitz, Ruth Eleanor Lufkin, Mary Marks, Sal Mazzola, Mary McMahon, Daniel Noonan, Megan Palasciano, Antonio M. Pasqualoni, Beth Patterson, José Fernando Peña, Fran Pfeffer, Frances Pinto, Laura Poll, Jill Ratzan, Robert Rittman, Vivian Thiele, Regan L. Tuerff, Susan Turkel, David Utz, Mary O. Walker, Renee Watson, Karen Wenk, Melissa Yonteck, and Zhu Xuening (Sean). Terry Edwards was an especially careful reader, checking not only text for sense and typos, but also the embedded index strings! Jill Ratzan gave parts of the final text a rigorous perusal.
● 13
I thank them for their valuable editorial assistance. As they worked with earlier drafts of this book, they were very good at pointing out defects.
● acknowledgment to Milstead (Jessica L.), Wellisch (Hans H.), members of NISO Committee YY, and executive director of NISO : 14
Special gratitude goes to my long-term colleagues and primary mentors in the world of indexing, information retrieval, and information science, Dr. Jessica Milstead and Dr. Hans Wellisch, who read intermediate drafts of this book and made many excellent suggestions, most of which I endeavored to implement. I also thank the members of NISO Committee YY, who worked closely with me for many years (1991-1997) on the issues addressed in this book: Barbara Anderson, Knight-Ridder Information, Inc.; Catherine Grissom, U.S. Department of Energy, Office of Scientific and Technical Information; Nancy Mulvany, Bayside Indexing Service; Barbara Preschel, Public Affairs Information Service; Deborah Swain, IBM and Society for Technical Communication; and (again) Hans Wellisch, University of Maryland; and also (again) Jessica Milstead, our liaison from the NISO Standards Development Committee, and Patricia Harris, NISO executive director, both of whom shepherded our work with expertise, care and compassion!
0.4. Special Thanks to Scholars and Practitioners of IR for the Use of Their Work.
● acknowledgment to scholars and practitioners of special importance : 15
This book rests squarely on the work of hundreds of colleagues in the world if IR and IR database design. Every author and every published work is listed in the bibliography, but here we want to give special thanks to those colleagues and organizations whose work we have used most extensively, sometimes with extensive quotes. I trust that our use has been within the bounds of scholarly "fair use," but beyond the legalities of use and attribution, we want to express our sincere appreciation for and dependence on their work — indeed, these authors and organizations are co-authors of this work with us:
ABC-Clio, American Library Association, Marcia J. Bates, Bliss Bibliographic Classification, John P. Comaromi, William S. Cooper, Timothy C. Craven, Dewey Decimal Classification, Tamas E. Diszkocs, Karen Markey Drabenstott, Dublin Core, Eurovoc Thesaurus, Jason, Farradane, FOLDOC: The Free On-Line Dictionary of Computing, Bernd Frohmann, Rebecca Green, Stephan Greene, Donna Harman, David Harper, Marti Hearst, Birger Hjørland, Susan Hockey, Robert R. Korfhage, Library of Congress, Gary Marchionini, Jessica L. Milstead, Modern Language Association of America, National Information Standards Organization, Miranda Pao, A. Steven Pollitt, S. R. Ranganathan, Ronald E. Rice, Rutgers University Libraries, Gerard Salton, Ben Schneiderman, Dagobert Soergel, Karen Sparck Jones, Elaine Svenonius, Unesco Thesaurus, Brian C. Vickery, Diane Vizine-Goetz, Bella Hass Weinberg, Hans H. Wellisch, Patrick Wilson.
● acknowledgment to students : 16
We also give special thanks to the students of James D. Anderson who shared their design work to help illustrate concepts in chapter 19:
Matthew Brown, Melissa Hoffman, Eric J. Johnson, Veronica Meyer, Minsoo Park, J. Fernando Peña, Elizabeth Pregill, Robert Rittman, Enola Romano, Lori A. Rowland, Jennifer Schroth.
0.5. Bibliographic Citations.
● style for bibliographic citations : 17
All publications cited in this book are listed in alphabetical order in the bibliography at the end of the book. Citations are presented in accordance with the U.S.A. national standard ANSI/NISO Z39.29-1979. Bibliographic references (American National Standards Institute 1979). A revision of this standard was approved in 2003. The only significant change for our purposes was moving the placement of dates for periodical articles from the end of the citation, after volume and issue numbering and pagination, to prior to the volume and issue numbering (National Information Standards Organization 2004?). We did not adopt this small change.
0.6. Dedication.
We dedicate this work to Rafael and Dwayne, the wind beneath our wings.