Moving Image Collection Evaluation

 
Compiled by Ying Zhang

 

Abbas, J., Norris, C. & Soloway, E. (2002). Middle school children's use of the ARTEMIS digital library. Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, Oregon: Portland, pp.98-105.
    In the study, Transaction log data analysis was employed to evaluate users' interaction with a genuine digital library. Instance of use and time spend per instance were the key measures.

Adams A, & Blandford, A. (2001). Digital libraries in a clinical setting: friend or foe? ECDL'01; Proceedings of the 5th European Conference on Digital Libraries, 214-224.

To understand the social and organizational impacts of digital libraries on clinical users, the authors conducted focus group discussions and in-depth interviews involved 73 hospital clinicians (nurses, doctors, surgeons, consultants, etc.). The results based on the ground theory indicate that users' perceived information needs, dissemination processes and the impact of newly introduced technology associated with organizational, social and political structures. In addition, organizational hierarchies, technological misconceptions, technology and information accessibility impeded the use of digital libraries.

Bainbridge, D., Dewsnip, M., & Witten, I.H. (In press). Searching digital music libraries. Information Processing and Management, Retrieved on September 24, 2004 from http://www.sciencedirect.com

To compare the performance of three difference melody retrieval models (dynamic, static, and n-gram-based matching) and algorithms devised under these models, the authors conducted a series of effectiveness and efficiency evaluations by employing traditional IR metrics, including precision, recall, number of relevant items retrieved, and query processing time. The results show that state based retrieval model has a good balance of efficiency and effectiveness, while n-grams is of higher efficiency. Nevertheless, the hybrid approach (3-grams followed by stated-based) has "the best overall combination of efficiency and effectiveness." Additionally, the use of both pitch and rhythm information of melodies yields to better performance than the solo use of any of the two. Whereas the research findings is suggestive to music digital library design, the solo use of topical relevance judgment and exclusion of real users might restrict the applicability.

Baldonado, M.Q.W. (2000). A user-centered interface for information exploration in a heterogeneous digital library. Journal of American Society for Information Science, 51(3): 297-310.

    The research series on evaluation of SenseMaker at Stanford University have essentially three objectives, namely comparing 3 result interfaces, evaluating iterative cluster interface and examining the fluidity between search and browse (structure-based searching & filtering). A number of criteria have been employed such as time, error rate, perceived speed, usefulness and usability.

Bergmark, D., Lagoze, C., & Sbityakov, A. (2002). Focused crawls, tunneling, and digital libraries. Research and Advanced Technology for Digital Libraries: Proceedings of the 6th European Conference, ECDL'02, September 16-18, 2002, Paris, France. 91-106.

Aiming to develop large size digital library collection, the authors propose the hybrid approach of focus crawls (best-first) and tunneling (prioritize a given page based on value other than relevance score). To test the effectiveness and efficiency of this approach, the authors run an automatic harvest on 500,000 unique documents downloaded from the Web and conducted statistical analysis on observed data. Three constructs are proposed to measure effectiveness and efficiency. They are nugget (a Web document whose cosine correlation with at least one of the collection centroids is higher than some given threshold), dud (a Web document that does not match any of the centroids very highly), path length (2 minus the number of duds in the path <the sequence of pages and links going from one nugget to the next>). The results show that expanded tunneling concept can achieve highly efficient and effective focused crawling. It should be noted that the harvest was done off the real Web.

Bertot, J.C., McClure, C.R. (2003). Outcome assessment in the networked environment: research questions, issues, considerations, and moving forward. Library Trends, 51(4): 590-513.

This article identifies a number of research questions related broadly to library outcomes assessment in a networked environment and discusses issues affecting these research topics. It also proposes a framework to relate traditional evaluation components and terminology to the networked environment and identifies a number of factors in the networked environment that affect outcomes and other assessment methods. Meanwhile, a multi-dimensional library service outcome assessment model is outlined.

Bishop, A. P. (2002). Measuring access, use and success in digital libraries. Retrieved April 18, 2003, from http://www.press.umich.edu/jep/04-02/bishop.html.

    This paper describes and evaluates the accessibility of DELIver, which is a testbed, and then discusses how to remedy the access barriers. Convenience and ease of use are viewed as especially important factors in this paper. The metrics for the evaluation include the number of registered system users, the number of hits logged on a digital library's home page, the number of documents viewed or printed, the degree of penetration, and so on. The faculty and graduate student are major audiences of DEIver. Focus groups, interviews, and observation are the major methods for this evaluation. The paper suggests that both subjective and objective access factors greatly influence use. The results also indicate that some subject factors make more contributions to encourage access than some objective factors.

Bishop, A.P. (1998). Measuring access, use, and success in digital libraries. The Journal of Electronic Publishing, v.3 (December 1998), Retrieved on September 25 from http://www.press.umich.edu/jep

The paper outlines the evaluation efforts during the implementation of DeLiver (a testbed collection of journal articles) across the University of Illinois Campus. The author highlights the significance of measurement and interpretation of digital library use. From the design and evaluation, the author found that access barriers do influence the system use, Whereas accessibility can be measured by both subjective (e.g. initial expectation of convenience, system awareness) and objective (e.g. difficulty in accessing the system) criteria, system use can be assessed through adapting the criteria for evaluating the use of a physical library (e.g. library use, material use, material access, degree of penetration). Accordingly, the combination of different methods (e.g. log analysis, interview) and benchmark evaluation measures were highly recommended.

Bishop, A.P. (1999). Making digital libraries go: comparing use across genres. Proceedings of the Fifth ACM Conference on Digital Libraries, 94-103.

The author combined multiple research methods (e.g. in-depth interview, survey, focus group, log analysis, etc.) to compare the information use of two different groups of users (i.e. academic and low-income communities). She concluded that digital library use is an assemblage activity associated with social practice, beliefs and goals, community norms, knowledge, technology access and proficiency, and resource constraints, and the interplay between them. New users not only need to learn how to use IR system functions, but also figure out how to make the system fuse into their daily life.

Bishop, A.P., Neumann, L.J., Star, S.L & Merke, C. et al. (2000). Digital libraries: situating use in changing information infrastructure. Journal of American Society for Information Science, 51 (4): 394-413.

Aiming to examine "how potential users approach new systems" (DeLiver, a DLI project at Univ. of Illinois), the authors employed various research methods, such as focus group, survey, interview, log analysis, user registration, and lab usability test. Several criteria, such as use, satisfaction, information convergence, access barrier, were used to achieve the research goal. Their research findings regarding usage statistics fell within the same range as those generated by similar full text journal system. Additionally, insignificant barriers (e.g. trivial technical problem) "became magnified in the effect of use". Moreover, the research findings suggest that it is essential to gain the knowledge of different processes of searching and use germane to different information worlds.

Biship, A.P., Van House, A.A., & Buttenfield, B.P. (2003). Digital Library Use: Social Practice in Design and Evaluation. Massachusettes, Cambridge: The MIT Press.

The book provides rich and in-depth arguments and evidences about digital library as a socio-technical system, which is composed of technology, information, carriers of information, people, and their practice. "It is about digital libraries' interaction with the larger world of work, institutions, knowledge, and society, as well as with the production of knowledge." (p.1) Accordingly, another theme of the book is to perform "technically informed social analysis" for DL design and evaluation. The book contains three parts and twelve chapters by a group of DL activists.

Blandford, A. & Buchanan, G. (2002). Workshop report: Usability of Digital Libraries @ JCDL'02, ACM SIGIR Forum, 36(2): 83-89.

The authors summarize usability related issues from Digital Libraries @ JCDL'02. The usability test findings reported show: (1) unclearness about what is usability, although there was essentially agreement that the term includes many aspects; (2) "immature understanding of what techniques are appropriate for addressing particular aspects of design and evaluation (p.85), although there were a number research techniques proposed; (3) difficulty of identifying potential users and their tasks; (4) little understanding about what user's assumptions and familiarity about DL's content, interface, process and features; (5) divergence of main concerns of different groups of people: whereas users are more concerned about results, LIS professionals are emphasizing the procedures; (6) users' little understanding of the role and purpose of different metadata fields. Other findings demonstrate that users don't use systems as expected and usually have poor information handling skills.

Blandford, A., Keith, S., Connell, I., & Edwards, H. (2004). Analytical usability evaluation for digital libraries: A case study. Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, pp.27-36.

Having argued for the necessity and feasibility of analytical usability evaluation (conducted by expert using established theories and methods) as a complementary approach to empirical approach, the authors compare four different analytical approaches, namely Heuristic Evaluation (HE), Cognitive Walkthrough (CW), Claims Analysis (CA), and Concept-based Analysis of Surface and Structural Misfits (CASSM). For each of the four approaches, they demonstrate its pros and cons via conducting a case analysis using a single Web site. The main comparison demonstrate that HE and CW can only generate superficial data and inadequate for DL domain, whereas the other two can help in identifying "conceptual difficulties" and are more useful to DL contexts. In particular, the proposed CASSM approach is promising in terms of integrating empirical data to usability principles. Nevertheless, none of the approaches fit "seamlessly with existing digital library development practices."

Blixrud, J.C. (2002). Measures for electronic use: the ARL E-Metrics project. Retrieved on April 12, 2004 from http://www.lboro.ac.uk/departments/dis/lisu/Blixrud.pdf

In order to determine whether their money subscribing electronic resources is worth investing, 24 ARL members self-funded the 2 years project to "develop measures for describing the resources, expenditures, and usage of electronic resources." The project has three phrases, namely initial (inventory of current practices at ARL libraries as to statistics, measures, processes, and activities that pertain to networked resources and services), second (identification and field testing of statistics and measures, recommendation of measures, surveys and onsite test were used in the stage), and final (identification of linkage to educational outcomes and impacts, to research, and to technical infrastructure, content analysis on standards from education commissions).

Blocks, D., Binding, C., & Cunliffe, D. et al. (2002). Qualitative evaluation of thesaurus-based retrieval. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference, ECDL'2002, September 16-18, 2002, Paris, France. 346-361.

The evaluation is part of the FACET project with the collection of the National Museum of Science and Industry in UK. The main thesaurus is faceted Art & Architecture Thesaurus. To illuminate problems, and inform interface design, the authors conducted this formative evaluation (log analysis, think-aloud, screen-capture videotaping, observation note) to analyze users' interaction with the interface components (e.g. thesaurus display) experimental system. In particular, the evaluation was focused on how the displayed thesaurus could assist search process, the formation of faceted queries, and query reformation. Totally, there were eight participants in the study among which six were museum professionals, one IT and one library professional. The search sessions took place in there participants' acquaint place rather than lab. The results show that "although the prototype interface supports basic level operations, it does not provide non-expert searchers with sufficient guidance on query structure and when to use the thesaurus."

Bollen, J. & Luce, R. (2002). Evaluation of digital library impact and user communities by analysis of usage patterns. D-Lib Magazine, 8 (6), Retrived on July 1, 2003 from http://www.dlib.org/

With a belief that user preferences and satisfaction tend to be highly transient and specific, the authors argue for the significance of quantitative analysis on more implicit, user community determined preferred relationships among documents from server logs. In their study, the document impact was established by using subjective Journal Consultation Frequency (JCF) rather than traditional Impact Factor (IF). Whereas the IF is associated with definite citation, the JCF is determined by search pattern of a given use community. Pitifully, the article does not provide empirical data revealing how good the correlation between different journals' JCF can be used to as an indictor of the quality of DL collection and user community's preference.

Borgman, C.L. Leazer, G.H., & Gilliland-Swetland, A. et al. (2004). How geography professors select materials for classroom lectures: Implication for the design of digital libraries. Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 179-185.

This is a HIB research aiming to "have close understanding of process by which faculty search for and use information in support of their teaching." For the research objective, the authors conducted an interview with nine professors of physical geography and human geographers. Although the research per se does not target on DL context, several key findings are suggestive to DL (e.g. ADPEP) design in terms of: (1) the necessity of providing more features for concept and teaching purpose searching, creation and management of personalized DL, resource sharing, and so forth.

Borgman, C.L. (2002). Challenges in building digital libraries for the 21st century. Digital libraries: people, knowledge, and technology: Proceedings of 5th International Conference on Asian Digital Libraries, ICADL 2002, Singapore, December 11-14, 2002 1-13.

The paper summarizes some challenges facing the transition from DL research /development to practice. One of the challenges is the need of evaluation to know what works and in what contexts. "Appropriate evaluation methods and metrics are requirements for sustainable digital libraries that have received little attention until recently." "Evaluation has many aspects and can address a variety of goals, such as usability, maintainability, interoperability, scalability, and economic viability." Moreover, "consistent evaluation methods will enable comparison between systems and services." (p.8) However, the reality is pessimistic. "Despite the advances in digital library technology, we have insufficient understanding of their utility for most applications, and we lack appropriate evaluation methods, metrics, and test beds for determining their effectiveness relative to various benchmarks."

Borgman, C.L., Gilliland-Swetland, A.J. (2000). Evaluating digital libraries for teaching and learning in undergraduate education: a case study of the Alexandria Digital Earth Prototype (ADEPT). Library Trends, 49 (2): 228-250.

    This study employed an ethnographic observation of course content, teaching style to examine genuine users' information needs and behavior.

Borgman, C.L., Leazer, G.H., et al. (2001). Iterative design and evaluation of a geographic digital library for university students: a case study of the Alexandria Digital Earth Prototype (ADEPT). Proceedings of the Fifth European Conference on Research and Advanced Technology for Digital Libraries, Darmstadt, Germany, pp. 390-401.

    The paper reports multiple methods (e.g. survey, structured interviews, classroom observation, laboratory-based usability studies, etc.) in evaluating ADEPT digital library, User needs, content, modes of searching and using resources, usability, transparency, and diversity/ extensibility of metadata were examined.

Borgman, C. L. (2002). Final report to National Science Foundation computer and information science directorate information and intelligent systems division: Workshop goals, outcomes, and recommendations. Fourth DELOS Workshop: Evaluation of digital libraries: Testbed, Measurement, and Metrics. Retrieved June 1, 2003, from http://www.dli2.nsf.gov/internationalprojects/working_group_reports/evaulation.html.

    This report is a summary for the fourth DELOS workshop. The basic issues, such as the importance and the content of the evaluation of digital library, are discussed. U.S. and E.U. research activities on DL evaluation are summarized. The themes of the workshop are presented. In the end, the workshop gives the recommendation for the future DL evaluation.

Bosman, F.J.M., Bruza, P.D., & van de Weide, Th.P. et al. (1998). Documentation, cataloging, and query by navigation: A practical and sound approach. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference, ECDL'98, September 21-23, 1998, Heraklion, Crete, Greece. 459-478.

To test how effective the HyperIndex (index organized in the form of hypertext) can help users search a collection of visual reproduction of art subjects, the authors conducted several experiments to compare search performance with HyperIndex and ICONCLASS (non- hyperlinked but standardized, well documented classification system). The participants were asked to find answers to the predefined questions and their interactions with system were logged and analyzed. The effectiveness is measured by using traditional pair of precision and recall. Additionally, the number of logical decision (instance in the log file, such as each class selected, and documents viewed) and show (view of current selection) were also used as indictors of effectiveness. The results show the "advantage of HyperIndex over ICONCLASS."

Browne, P., Gurrin, C., et al. (2001). Dublin City University Video Track experiments for TREC 2001

    The study compared three video browsers (timeline, slide show, and hierarchical.) using shot boundary detection. Tradition P/R, reference transition, deletion rate, and insertion rate were the key measures.

Budhu, M. & Coleman, A. (2002). The design and evaluation of interactivities in a digital library. D-Lib Magazine, 8 (11):

The paper summarizes an educational evaluation regarding learning effects when users interact with GROW (Geotechnical (soil), rock and water engineering) collection at Univ. of Arizona. The concept of interactivity focuses on structured representations of interactive multimedia resources. The leaning effects were measured by students' perform test in the lab in terms of understanding of the concepts learned. The evaluation results are encouraging. In addition to the preliminary evaluation effort, the further plan for the usability test is outlined as well.

Carter, D., & Janes, J. (2001). Unobtrusive data analysis of digital reference questions and service at the Internet Public Library: and exploratory study. Library Trends, 49 (2): 251-265.

    The authors conducted a transaction logs analysis to examine information needs, questionnaire nature and reference service at Internet Public Library.

Champeny, L., Borgman, C.L., & Leazer, G.H. et al. (2004). Developing a digital learning environment: An evaluation of design and implementation processes. Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. 37-46.

This is a case study of designing and implementing ADEPT (Alexandria Digital Earth Prototype) for undergraduate geography education in a university setting (University of California, Santa Barbara) during a period of one academic year. To ensure the implementation reaches the goal of DLE (digital learning environment) development, iterative evaluation, including interview on professors, TAs, and students, and classroom observation has been conducted with focuses on flexibility, openness, ease of use, and learning outcome. Overall, the evaluation shows "modest improvements" between the Fall and Spring semesters primarily with respect to reliability of access, and learning outcomes (test of graph understanding and hypothesis generation). However, learning difficulties were still perceived by students. Furthermore, there are noticeable functional dichotomy between designers and users, as well as between teachers and students. For instance Whereas designers showed great interests in applying advanced technology and expect users explore functionality by themselves, users expected to see explanatory documents and tutorials. Similarly, whereas teachers were "generally tolerant of system flaws," students were uncomfortable with their lack of understanding and control of the system.

Choudhury, S., Hobbs, B., & Lorie, M. (2002). A framework for evaluating digital library services. D-Lib Magazine, 8 (7/8):

The authors adopted a multi-attribute, stated-preference technique to cost-benefit evaluation of hypothetical digital library service, namely Comprehensive Access to Printed Materials (CAPM) at Johns Hopkins University. The target users (students, faculty, and staff) were asked to participate in an online survey, which was designed by using the results of focus group discussions. Cost-benefit was measured by users' preferred pair of service level and price from 36 possible options. The survey results show the users' hypothetical willingness of payment for the remote accessible full-text service provided by CAPM.

Covey, D.T. (2002). Usage and usability assessment: library practices and concerns. Washington, D.C.: Digital Library Federation Council on Library and Information Resources. Retrieved on 4/12/2004 from http://www.clir.org/pubs/reports/pub105/contents.html

The report provides research findings from interviews with 71 individuals at 24 of the 26 DLF (Digital Library Federation) member institutions (representing an 86 percent response rate at the 24 institutions) conducted from November 2000 through February 2001. A standard set of open-ended questions was used to examine the DL collection and service evaluation they had done with respect to methods, results, experiences and lessons, and so forth. Follow-up questions varied, based on the work being done at the institution; in effect, the interviews tracked the efforts and experiences of those being interviewed. In general, focus groups, survey, user protocol, heuristic usability test, paper prototype/scenario, as well as card sorting tests are used for user studies and log analysis for usage study.

Cox, I.J., Miller, M.L., Minka, T.P., Papathomas, T.V. & Yianilos, P.N. (2000). The Bayesian image retrieval system, PicHunter: Theory, Implementation and Psychophysical experiments. IEEE Transactions on Image Processing, 9(1): 20-37.

The authors propose a Bayesian framework for content-based image retrieval, that is based on users' similarity judgment to direct a search. When assessing the performance of PicHunter, which employs the framework, the authors conducted a series of experiments. The performance is measure by "the average number of images required to converge to the desired specific target". The results indicate that users "attend to the semantic content of images in judging similarity".

Cullen, R. (2003). Evaluating digital libraries in the health sector. Part 1: measuring inputs and outputs. Health Information and Libraries Journal, 20(4): 195-204

Cullen, R. (2004). Evaluating digital libraries in the health sector. Part 2: measuring impacts and outcomes. Health Information and Libraries Journal, 21(1): 3-13.

Dillon, A. (1999?). Evaluating on time: a framework for the expert evaluation of digital interface usability. Retrieved on 4/12/2004 from http://www.ischool.utexas.edu/~adillon/publications/evaluating.html

Based on nine years of investigations of human information usage from an HCI viewpoint, the author proposes the TIME model aiming to address key human factors for digital library evaluation. "The intention of the framework is to provide those developing digital information resources a way to conceptualize the human factors influencing the usability of the created artifact." The multi-leveled framework highlights the interplay among the key human factors, namely Task (reflect user's need and use), Information model (user's mental representation of the information space), Manipulation (support physical use of materials), and Ergonomics (variables influencing the perceptual processing of words and images).

Dorward, J., Reinke, D. & Recker, M. (2002). An evaluation model for a digital library service tool. Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Library, 322-323

This paper summarizes a series of educational evaluation activities for a NSF educational DL project-Instructional Architech (IA), which enables users to discover, select, reuse, sequence, and annotate digital learning objects. The highlighted evaluation strategies are iterative, user-centered, and rapid-prototyping. The evaluation model focuses on both project process and outcome. As such, the authors employed multiple research methods at different implementation stage for various purposes, such as (1) pilot survey of audience (for needs assessment in terms of teachers' level of online teaching resource use, and their perception of the utility of the IA system), (2) expert review of interface (a group of graduate students and external professionals in instructional technology make critiques on interface design and content plan based on scenario walk through), (3) prototype testing (26 pre-service elementary teachers were asked to locate learning objects for 2 search scenarios from SMETE Open Federation Digital Library using the software developed. Observation, post-session focus groups and post-session surveys were used to collect data on their perception of utility.

Entlich, R., Garson, L., Lesk, M., & Normore, L. et al. (1996). Testing a digital library: user response to the CORE project. Library Hi Tech, 14 (4): 99-118.

The paper focuses on (expectation, perception and usage) to CORE (The Chemical Online Retrieval Experiment) aiming to project scholarly journals in e-form. The authors employed various evaluation techniques (detailed transaction log, online questionnaire, online comments, in-person interviews, and anecdotes) for user study. The comparison between the interview and transaction results was conducted for their perception and actual deed. The results show that there were about 47% recursive (repeat) users. Other data analysis on CORE usage include article viewing, printing, reading habit, searching, and so forth.

Fox, E.A., Hix, D., et al. (1993). Users, user interfaces, and objects: Envision, a digital library. Journal of the American Society for Information Science, 44(8): 480-491.

    Envision is a digital library project aiming to provide users with computer science literature. The authors interviewed potential users, librarians, and computer/information scientists of the Envision. The collected data was used to come up with nine principles for digital library development under three categories, namely representation, architecture, and interfacing. A usability test illustrated that the application of the nine principles to Envision implementation was successful.

Fuhr, N., Hansen, P., Mabe, M., Micsik, A., & Solvberg, I. (2001). Digital libraries: A generic classification and evaluation scheme. Research and Advanced Technology for Digital Libraries. Proceedings of the 5th European Conference on Digital Libraries, ECDL 2001 (Lecture Notes in Computer Science Vol.2163), Germany: Darmstadt, pp. 187-199.

    This article describes a holistic approach to digital library evaluation. A scheme is presented, which includes four major dimensions: data/collection, system/technology, users, and usage. The evaluation criteria and metrics with respect to the dimensions are addressed. In order to test this scheme, an online survey is conducted. Only 3%-4% of the targeted audience responded, among which 70% respondents are from the research domain. The results indicate that the proposed classification scheme seems to be appropriate for DL characterization.

Fuhr N., Klas, C.P., Schaefer, A. & Mutschke, P. (2002). DAFFODIL: an integrated desktop for supporting high-level search activities in federated digital libraries. Research and advanced technology for digital libraries: Proceedings of 6th European conference, ECDL'02, Paris, France, September 16-18, 2002, 597-612.

DAFFODIL (Distributed Agents for user-friendly Access of Digital Libraries) is a federated DL system that offers a rich set of functions across a heterogeneous set of DLs. To examine how usable this system is, the authors conducted heuristic evaluation and questionnaire interviews. During the heuristic evaluation, perceptions and problems reported from the participants' interface exploration were recorded. The findings include: irritation due to long waiting times, unsure about the cause of empty results, need of assistance in employing new concepts (e.g. author network browser), and so forth. Meanwhile, the questionnaire interview on the usability of the DAFFODIL after they tried it show the difficulty of interpreting and making relevance judgement due to less precise and very large result sets.

Greenberg, J., Bullard, K.A., & James, M.L. et al. (2002). Student comprehension of classification applications in a science education digital library. Research and Advanced Technology for Digital Libraries: Proceedings of the 6th European Conference, ECDL'02, September 16-18, 2002, Paris, France. 560-567.

To explore whether children may have the same comprehension of scientific classification in educational digital libraries as they have in physical libraries, the authors conduct an experiment to compare six-grade students' understanding of botany classification scheme in UNC's Plant Information Center (PIC) and physical library. The students were asked to assign taxonomy name to a given plant, and complete a survey with a series of questions about how objects are grouped in physical and digital environment. Whereas the classification tasks were successfully completed in both settings, the understanding of the classification structures in the digital setting seemed to have diminished compared with physical libraries.

Han, J.W. & Guo, L. (2003). A shape-based image retrieval method using salient edges. Signal Processing: Image Communication, 18: 141-156.

The authors present a novel five-stage image retrieval approach based on automatic salient edge detection and similarity matching among the detected edges. To examine how good the proposed approach is in terms of effectiveness and efficiency, a preliminary experiment was conducted to compare the system using the approach with other two image retrieval models. The results show that the new approach has the highest accuracy but the longest retrieval time.

Hartland-Fox, B., & Dalton, P. (2003). EVALUEd-an evaluation model for e-library developments, Ariadne, 31. Retrieved June 1, 2003, from http://www.ariadne.ac.uk/issue31/evalued/.

    This article describes a project, i.e., eVALUEd, by which the issues on e-library evaluation are discussed. These issues include: "what techniques are being employed, who uses the data collected, how evaluation can inform decisions, and what evaluation could be conducted given more time, resources, staffing etc." An online survey is conducted for this study.

Hauptmann, A., Jin, R., et al. (2001). Video retrieval with the Informedia Digital Video Library System. TREC'01 , Retrieved on June 1, 2002 from http://trec.nist.gov/pubs/

    The study evaluated the effectiveness of information extraction techniques employed in Informedia Digital Video Library at Carnegie Mellon. Average Reciprocal Rank (ARR) and Recall were used to measure the effectiveness. Meanwhile, usability test was conducted to evaluate the interface.

Hee, M., Ik, Y.Y., & Kim, K.C. (1999). Unified video retrieval system supporting similarity retrieval. Proceedings of Tenth International Workshop on Database and Expert Systems Applications, pp.884-888.

    The study compared the effectiveness of integrated feature-based and annotation-based similarity retrieval. The traditional P/R were modified as 'user-defined relevant scenes nretrieved relevant scenes'/'user defined relevance scenes' (R) and 'user-defined relevant scenes nretrieved relevant scenes'/'retrieved scenes'(P) respectively.

Hidaka, T., Abe, T., & Kokogawa, T. (2001) NetLibra: and advanced digital library system based on CORBA. TREC'01, Retrieved on June 1, 2002 from http://trec.nist.gov/pubs/

    The study compared the efficiency of searching distributed digital libraries via CORBA networking technology. Retrieval time was the key measure.

Hill, L.L., Carver, L. et al. (2000). Alexandria Digital Library: user evaluation studies and system design. Journlal of the American Society for Information Science, 51 (3): 246-259.

    Online survey, Ethnographic observations and target user groups were employed to examine information needs and information seeking patterns. The authors also conducted usability test.

Hill, L.L., Dolin, R., et al. (1997). User evaluation: summary of the methodologies and results for the Alexander Digital Library, University of California at Santa Barbara. In C. Schwartz et. (Eds.) Proceedings of the American Society for Information Science (ASIS) Annual Meeting, Washington DC, November 1997 (http://www.asis.org/annual-97/alexia.htm) (pp.225-243, 369). Medford, NJ: Information Today.

Huang, Z., Chung, W.Y., Ong, T.H. & Cheng, H.C. (2002). A graph-based recommendation system for digital library. Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, Oregon: Portland, pp.65-73.

    The study evaluated the effectiveness (how accurate the prediction reflects the real tendency) of hybrid search (i.e. content + association) for book recommend system. The traditional P/R was modified to be the key measure.

Huxley, L. (2002). Renardus: following the Fox from project to service. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference on Digital Libraries, ECDL'02, September 16-18, 2002, Paris, France. 218-229.

Renardus is a collaborative pan-European project, which provides "a single, multilingual user interface for cross-searching and cross-browsing distributed metadata collections held by 12 participating subject gateways." To examine how end users perceive Renardus, an online survey in five languages was delivered to potential users via the project Web site, email newsletter and relevant mailing lists across Europe. Whereas the majority response distributions are centralized, the ratings on the ease of use of the browsing functionality are different. Regarding the quality and adequacy of metadata information, the results show that the users outside LIS domain tended to feel difficult to understand the order of metadata element display.

Janssen, Olaf (2004) The European Library user survey of Gabriel, Gateway to Europe's National Libraries. Retrieved on September 2, 2004, from http://www.bl.uk/gabriel/index.html

The online survey report "This report gives insight into the background of the respondents, their use of the internet, their use of the Gabriel website and their opinions about this site. The respondents were also asked if they would use a shared catalogue of all the national libraries in Europe if that was to be created." The short and long PDF versions of the report are archived at http://www.kb.nl/gabriel/surveys/results2003/gabriel_survey_short.pdf and http://www.kb.nl/gabriel/surveys/results2003/gabriel_survey_long.pdf respectively. The screenshor of the online survey form is available at http://www.bl.uk/gabriel/surveys/results2003/screenshots.html

Jewell, T. D. (1998). The ARL "investment in electronic resources" study: Final report to the council on library and information resources. Retrieved on April 13, 2003, from http://www.arl.org/stats/specproj/jewell.html.

    This article examines the expenditures of electronic resources, the organization and accessibility of resources, and outcomes or how does availability of electronic resources affect users. Surveys are the major method in this study. The results are summarized in this article. The major conclusion is that solid information about the expenditure of electronic resources should be clarified.

Jones, G.J.F., & Lam-Adesina, A.M. (2002). An investigation of mixed-media information retrieval. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference, ECDL'02, September 16-18, 2002, Paris, France. 463-478.

To investigate how effective the mixed-media retrieval (text, document image, and spoken document in this study) would be if compared with mono-media search, the authors conducted an experiment using the existing TREC text, spoken and scanned image collections along with the spoken document retrieval task. Meanwhile, the systems with and without pseudo relevance feedback (PRF) were also compared at mono and mixed collection levels. The results show that the query expansion via summary-based PRF provided large improvements in performance for spoken documents, good improvements for TR, but surprisingly could lead to significant reduction in performance for document image retrieval." In general, "mixed-media retrieval performs well without compensation for media specific indexing problems.

Jones, M.L.W., Gay, G.K. & Rieger, R.H. (1999). Project soup: comparing evaluations of digital collection efforts. D-Lib Magazine, 5 (11) Retrieved on 3/12/2003 from http://www.dlib.org/

The authors from the Human-Computer Interaction Group at Cornell University investigated digital collection evaluation efforts across five different DL prototype projects with emphasis on "backstage" concerns (e.g., metadata, copyright and intellectual property issues), collection maintenance and access (e.g., decisions regarding collection scope and the maintenance of a consistent quality and fidelity of digital records) and usability findings. The authors argue for the necessity of Establishing an Effective Content Base: that is finding a balance between quantity ("Achieving Critical Mass"), quality ("Ensuring Fidelity and Accuracy") and usability (" Meeting the Goals and Objectives of the End-User"). Additionally, the significance of usability in digital collection evaluation is emphasized with the statement that "a digital collection can contain a critical mass of high quality, copyright-cleared content all organized around a solid metadata foundation, and still prove to be a failure."

Jones, S., Cunningham, S.J. et al (2000). A transaction log analysis of a digital library. International Journal of Digital Libraries, 3:152-169.

    The authors conducted a transaction log analysis to examine human information behavior regarding query formulation and reformulation and the use of the Computer Science Technical Reports Collection at New Zealand.

Jones, S. & Paynter, G.W. (2002). Automatic extraction of document keyphrases for use in digital libraries: evaluation and application. Journal of the American Society for Information Science & Technology, 53 (8): 653- 677.

The authors evaluated the Kea automatic keyphrase extraction techniques used in the New Zealand Digital Library. There are essentially two evaluation purposes: (1) compare three different Kea techniques in terms of how effective the automatically extracted keyphrases match human identified keyphrases and (2) compare the evaluation results based on authors' keyphrases with those based on human users' selection. 28 human users were asked to rating the suitability of each phrase (automatically extracted, author assigned, or user assigned) as a keyphrase of the document on an 11-likert scale. Modified pair of precision/recall and other statistical analysis methods (e.g. Kappa statistic K & Kendall Coefficient of Concordance W for the level of inter-person agreement) were employed for the comparison. The results show that "in general Kea produces keyphrases that are rated positively by human assessors."

Kapidakis, S., Terzis, S., Sairameshi, J., & Nikolaou, C. et al. (1998). A management architecture for measuring and monitoring the behavior of digital libraries. Research and Advanced Technology for Digital Libraries: Proceedings of Second European conference on Digital Libraries, ECDL '98, Heraklion, Crete, Cyprus, September 21-23, 1998, 1513: 95-114.

The authors propose a management architecture which can be used for monitoring and balancing query load in distributed digital libraries so as to improve efficiency of performance. The efficiency is measured in the following four aspects: local search response time, local index database processing time, remote processing time of the index database, and remote search response time (includes network delay + remote processing time). The lab experiment demonstrates the architecture is promising with respect to the expected effect, although further real setting test is required.

Kassim, A. R.C., & Kochtanek, T. R. (2003). Designing, implementing, and evaluating an educational digital library resource. Online Information Review, 27(3): 160-168.

The authors report their iterative and interwoven design/evaluation efforts on Project i-DLR, a Web based educational resource on digital libraries (http://www.coe.missouri.edu/~rafee/idigital_libraryR). To ensure the site is well-implemented for its target users (both beginner and experts in DL research/professional fields), they conducted a series of studies by using different qualitative and quantitative research methods: focus group interviews, Web log analysis, Web survey, and remote usability evaluation. Whereas group interviews and remote usability identified a set of problems users encountered during interacting with the system, the log analysis show some noticeable HIB (human information behavior), such as the preference of browsing that searching, simple search query than the use of Boolean operators. In particular, the authors advocate for the strengths of a remote usability evaluation in terms of convenience for users, more natural, authentic and unobtrusive setting, more variety of system environment, as well as cost-effectiveness.

Kenney, A.R., Sharpe, L.H., & Berger, B. (1998). Illustrated book study: digital conversion requirements of printed illustration. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference on Digital Libraries, ECDL'98, September 21-23, 1998, Heraklion, Crete, Greece. 279-293.

This collaboration project between the LC and Cornell U. Dept. of Preservation and Conservation reports the exploration of the "best means for digitizing the vast array of illustrations used in 19th and early 20th century publications." The authors evaluated the quality of digitized works at the following three levels: essence (how well the digital version has captured essence by using the unaided eye and within normal distance), detail (how well the digital version can represent the smallest significant part of the original as viewed closer or with slight magnification), and structure (how well the digital version can convey the information necessary to distinguish one illustration process type from another under various levels of magnification).

Khoo, C.S.G., Poo, D.C.C., & Toh, T.K. et al. (1998). E-referencer: a prototype expert system Web interface to online catalog. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference on Digital Libraries, ECDL'98, September 21-23, 1998, Heraklion, Crete, Greece. 316-333.

The authors developed E-referecer aiming to assist users to search OPAC effectively. The E-referecer essentially has two components: initial search strategies (composed of keyword/phrase search in all fields as strategy I and subject heading search as strategy II) and relevance feedback strategy. Based on 12 search topics selected from the university staff and students' submission, They compared search performances by the expert system and an experienced librarian (one of the authors). No improvement or even worsen performance was observed with respect to precision and average number of relevant records retrieved after using the E-referencer.

Khoo, M. (2001). Ethnography, evaluation, and design as integrated strategies: a case study from WES. ECDL'01; Proceedings of the 5th European Conference on Digital Libraries, 263-274.

The case study illustrates how ethnographic observation and documentation analysis can be used to assess The Water in the Earth System (WES) collection, a sub-project of DLESE. The theory of user-centered design and technological frames theory were employed to guide the analysis. From the observation of design meetings, a correlation was detected between institutional features of WES community and project centers. However, no investigation was conducted regarding how good the WES is and how it can help the user community to learn and locate desired information.

Kwak, B.H., Jun, W., Gruenwald, L. & Hong, S.K. (2002). A study on the evaluation model for university libraries in digital environments. Research and Advanced Technology for Digital Libraries: Proceedings of the 6th European Conference on Digital Libraries, ECDL'02, September 16-18, 2002, Paris, France. 204-217.

Having argued the necessity of new evaluation model for university library in digital age, the authors develop a model in two phases. In the 1st phase, an initial model was constructed based on the opinions of library experts and the previous works on the evaluation of both traditional and digital libraries were collected and analyzed. In the 2nd stage, Three-run Dlephi surveys on total number of 50 digital library-related professors, researchers, and university librarians were applied to develop a valid evaluation model. As the result, a new model, which consists of 7 categories (goal setting/vision, library specialization, information resources, information usability environment, information sharing, information services, and human resources & budget), 35 items, and 92 indicators, was finalized (p.213-215).

Ma, Y.F., Sheng, J. et al. (2001). MSR-Asia at TREC-10 Video Track: shot boundary detection task, Retrieved on June 1, 2003 from http://trec.nist.gov/pubs/

    The study evaluated the effectiveness & efficiency of boundary detection techniques. Reference transit, deletion rate, insertion rate, traditional P/R, and test/normal time were used as measures.

MacCall, S. L., Cleveland, A. D., & Gibson, I. E. (1999). Outline and preliminary evaluation of the Classical Digital Library Model. Proceedings of the American Society for Information Science. Retrieved from June 1, 2003, from http://www.bama.ua.edu/~smaccall/cdlm.html.

    This article describes and evaluates an alternative model of the database retrieval model, namely, the classical digital library model (CDLM). The number of "Clickthroughs" per month is used as a critical metric. Library and information professionals and endusers involved with primary care medicine are recruited as subjects to answer a series of questions. The results indicate that use of the digital library saves the user's time for retrieving information.

Marchionini, G. (2001). Evaluating digital libraries: a longitudinal & multifaceted view. Library Trends, 49 (2): 304-333.

    The author employed multiple methods (e.g. observations, semi-structured & group interviews, surveys, document analysis, learning effect analysis, etc.) to examine information needs, information use, system performance, educational effect of Perseus Digital Library.

Marchionini, G., Plaisant, C., & Komlodi, A. (2003). The people in digital libraries: multifaceted approaches to assessing needs and impact. In Ann P. Bishop et al. (ed.) Digital Library Use: Social Practice in Design and Evaluation. Massachusetts, Cambridge: The MIT Press. pp.119-160.

    The authors used observation, interview, document analysis (syllabi, reading room handouts, reference emails), questionnaires to compare the learning and teaching effects, system performance, and information needs in two digital libraries, namely, Perseus DL and Baltimore Learning Community.

Melucci, M. (2004). Making digital libraries effective: Automatic generation of links for similarity search across hyper-textbooks. Journal of the American Society for Information Science and Technology, 55(5): 414-430.

The author devised an automatic generation and insertion approach for similarity search across hyper-textbooks (HTB), which is based on statistical clustering algorithms. To assess the performance of the approach, he conducted a small test to see whether cosslinks among clusters from different HTBs are effective. In specific, he used two textbooks on information retrieval published in different years as testing HTBs, and drew randomly 10 clusters from one, and then compared them to the ones from the other. The results demonstrate a high intra-homogeneity between the two clusters. It should be noted that evaluation was conducted without the involvement of real users.

Meyyappan, N., Foo, S., & Chowdhury, G.G. (2004). Design and evaluation of a task-based digital library for the academic community. Journal of Documentation, 60(4): 449-475.

The authors report their evaluation efforts on how effective the proposed task-based information organization technique is in comparison with other two conventional techniques (i.e. alphabetical and subject-based) in terms of helping university community members to locate task-demand information from the DWE (digital work environment) at Nanyang Technology University. In addition to resources included in conventional DLs, DWE also collect other informal resources, such as course calendar, university statutes, etc. To address the evaluation question, an experiment was conducted with the participation of 60 information science students. The students were asked to search information for two sets of tasks on interfaces with different information organization techniques. Time to complete a single task and perceived usefulness were employed as criteria. The results show that (1) the "task-based approach took the least time in identifying information resources"; and (2) the hybrid approach and the task-based approach "were considered better than the other approaches for almost all the tasks."

Monopoli, M., Nicholas, D., Georgiou, P. & Korfiati, M. (2002). A user-oriented evaluation of digital libraries: case study the "electronic journals" service of the library and information service of the University of Patras, Greece. Aslib Proceedings-New Information Perspectives, 54(2): 103-117.

To examine the use of e-journal services at a university in Greece, the authors conducted an online survey with focuses on who are the primary user, how often they use the service and for what purpose, and what are their search strategies. 246 community members (out of the total number of 13,000 member) finished the survey. In general, the e-journal service was used by a wide age range, and the majority of respondents used the service on weekly or daily basis for a variety of reasons, such as writing papers for class, publication, degree, supporting lectures, and "keeping up with the progress in the relevant subject area," and so forth. Additionally, keywords were the most popular search method followed by author names, and the online help function was used by all age and occupation groups. Compared with their print counterparts, e-journals were considered to be easy to search and use, quick to access, and readily manipulated.

Orio, N. (2002). Alignment of performance with scores aimed at content-based music access and retrieval. Research and advanced technology for digital libraries: Proceedings of 6th European Conference on Digital Libraries, ECDL'02, Paris, France, September 16-18, 2002, 479-492.

The paper reports an approach allowing retrieve music performance through an automatic alignment of acoustic recordings with the corresponding score stored in a musical DL. To test how good the proposed approach is, the author conducted a preliminary test using small collection of acoustic and synthetic performance, during which an expert musician was asked to supervise the screen and report any mismatch found. The results are encouraging with a reasonable low mismatching rate (8.2%).

Paliouras, G., Papatheodorou, C., & Karkaletsis, V. et al. (1998). Learning user communities for improving the services of information providers. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference on Digital Libraries, ECDL'98, September 21-23, 1998, Heraklion, Crete, Greece. 316-333.

Aiming to examine how well the proposed automatic grouping algorithms are in terms of constructing meaningful user communities, the authors employed two criteria, namely overlap and coverage. Whereas overlap is measured by "the amount of overlap between the constructed descriptions," (ratio between the total number of categories in the description and the number of distinct categories that are covered), coverage is indicated by the proportion of news categories covered by the constructed user community descriptions. The results are very encouraging.

Park, S. (2000). Usability, user preferences, effectiveness, and user behaviors when searching individual and integrated full-text databases: implications for digital libraries. Journal of the American Society for Information Science, 51(5): 456-468.

    The author modified TREC-5 interactive task and collection to compare users' interaction with integrated and common interface in DL. Aspectual recall was the key measure.

Purcell, G.P., Rennels, G.D. & Shortliffe, E.H. (1997). Development and evaluation of a context-based document representation for searching the medical literature. International Journal on Digital Libraries, 1 (3): 288-296.

The test the proposed context model, the author compared inter-subject consistency when the subjects (medical students) were asked to assign document structure name to each sentence/paragraph in medical articles from 4 leading medical journals. The inter-subject consistency was measured by the kappa coefficient of agreement for nominal scales. The results show that there is substantial agreement among the subjects, and hence the model seems to provide effective indexing scheme. Pitifully, the evaluation was conducted based on looking at paper documents rather than those in electronic formats.

Rui, Y., Gupta, A., & Acero, A (2000). Automatically extracting highlights for TV baseball programs. ACM Multimedia 2000, 105-115.

    This study evaluated the effectiveness & efficiency of the automatic segment extraction techniques. Segment overlap amd access time were the key measures employed.

Salampasis, M., Tait, J. & Bloor, C. (1998). Evaluation of information-seeking performance in hypermedia digital libraries. Interacting With Computers, 10: 269-284.

    The study evaluated the effectiveness of information seeking performance by employing relative distance relevance (RDR) as the measure in hypermedia network contexts.

Sanderson, M. & Crestani, F. (1998). Mixing and merging for spoken document retrieval. Research and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference on Digital Libraries, ECDL'98, September 21-23, 1998, Heraklion, Crete, Greece. 397-407.

Aiming to address the most effective technique for spoken document retrieval, two main bodies of experiments were conducted on the TREC-6 SDR collection (1451 broadcast news stories along with 49 known item topics) together with the Abbot generated transcript. One is with merged collection and another mixed collection (manually vs. automatically generated manuscripts). The effectiveness is the criterion and measured by mean rank, mean reciprocal, and number of queries where the relevant document is found in the top n rank, where n is 1, 5, or 10. The results reveal that the shortcoming of the if-idf weighting approach for this type of IR retrievals.

Saracevic, T. (2000). Digital library evaluation: toward and evolution of concepts. Library Trends, 49 (3): 350-369.

    Having acknowledged the lack, the difficulty as well as the necessity of digital library evaluation, the author proposes a conceptual framework that outlines the constructs and contexts of digital library development. Meanwhile, the author also suggests an adaptation approach borrowing existing criteria from conventional library, Information retrieval and interface evaluation.

Seadle, M. & Peters, T.A. (2000). Project ethnography: an anthropological approach to assessing digital library services. Library Trends, 49 (2): 370-385.

In the paper, the authors argue that for digital library evaluation, "anthropology can provide the initial understanding, the intellectual basis, on which informed choices about population, survey design, or focus group selection can reasonably be made." In light of the argument, nine target user samples of National Gallery of the Spoken Word (NGSW) were identified. The micro-cultures and characteristics of the samples were examined. The implications of the cultural interpretation results to further evaluation are discussed.

Smeaton, A.F., Paul, O. et al. (2001). The TREC-2001 Video Track report. Retrieved on June 1, 2003 from http://trec.nist.gov/pubs/

    The study evaluated the effectiveness of shot boundary detection for known topics & general topics. Traditional P/R ratio and the amount of relevance disagreement among assessors were the key measures.

Solvberg, I. T. (2002). Report of breakout group on metrics and testbeds. 4th DELOS workshop on DL evaluation. Retrieved on June 1, 2003, from http://www.sztaki.hu/conferences/deval/presentations/Breakout_metrics.doc.

    This report discusses the metrics and testbeds for digital library evaluation. The author proposed four questions should be asked at the beginning of the evaluation process: who need this? what shall be evaluated? why is this needed? and how can it be done? The basic components of a testbed are listed, which include the collections of documents, DL-system components, and user components. Future researches are proposed.

Sumner, T. & Melissa, Dawe (2001). Looking at digital library usability from a reuse perspective. Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries. June 24-28, 2001, Roanoke, VA. New York: ACM., pp.416-425.

    The authors conducted interview, observations to assess the usability of a digital library. The key measures included information needs, reuse intent, resource location, comprehension, modification and sharing effects.

Sumner, T., Khoo, M., & Recker, M. et al. (2003). Digital libraries in the classroom: Understanding educator perceptions of "quality" in digital libraries. Proceedings of the third ACM/IEEE-CS joint conference on Digital libraries, 269-279.

To examine educators' expectation and perceptions of DL (resource) quality, the authors conducted a series of five focus groups with 37 practicing teachers, pre-service teachers, and science librarians, drawn from different educational contexts (i.e., K-5, 6-12, College). The informants were presented with diverse representative digital educational resources (see p.273 for the list). The findings show that the informants need "high quality" teaching and learning resources as well as additional contextual information beyond that in the resource. Additionally, perceptions of scientific accuracy, bias, advertising, design and usability, and the potential for student distraction should also be taken into account.

Tolle, K.M., & Chen, H. (2000). Comparing noun phrasing techniques for use with medical digital library tools. Journal of the American Society for Information Science, 51 (4): 352-370.

Aiming to examine effectiveness of their proposed noun phrasing NLP (Natural Language Processing) technique (AZ Noun Phraser), the authors compared relative recall/precision of the technique with other three counterpart techniques. They conducted an experiment involved 19 domain expert participants (medical librarians, doctors, medical students/researchers), 10 abstracts randomly selected from CANCERLIT, and corresponding 10 alphabetically arranged noun phrases lists derived by the four algorithms from the 10 abstracts. The results show that the proposed technique with SPECIALIST Lexicon "resulted in improved recall and precision".

Wesson, J., & Greunen, D.V. (2002). Visualization of usability data: measuring task efficiency. Proceedings of the Conference of South African Institute of Computer Scientist and Information Technologists, SAICSIT 2002, 11-18.

    The authors employed a number of measures to assess the usability of a digital library interface, such as diversity, complementarity, decomposition, parsimony, space/time resource optimization, self-evidence, consistency, and attention management.

White, M.D. (2001). Digital reference services: framework for analysis and evaluation. Library & Information Science Research, 23 (3): 211-231.

The author proposes a descriptive model for analyzing and evaluating digital reference services. The model "consists of about 100 questions related to 18 categories in four broad areas," namely mission/purpose, structure/responsibility to client, core function, and quality control. To test the model, the author analyzed the following 6 aspects of 20 digital reference services: archive, content, selectivity, privacy protection, access, and browsability/searchbility. The results show usefulness of the model illustrating strengths and weaknesses of each service in examined aspects.

Whitlatch, J.B. (2001). Evaluating reference services in the electronic age. Library Trends, 50 (2): 207 - 212.

The paper aims to examine how evaluation criteria and methods for traditional library reference services can be used in digital reference settings. Accordingly, different evaluation methods, such as survey/questionnaire, observation, individual and focus group interview, and case study, were discussed. Based on reviewing strengths and weaknesses and representative studies applying these methods, the author concludes that these methods "can be used very effectively" in the new environment. However, dramatic changes of reference process provide challenges to researchers with respect to assess how users and librarian/reference systems interact in non face-to-face environment.

Wildemuth, B., Marchionini, G., Yang. M., Geisler, G., Wilkens, T., Hughes, A., & Gruss, R. (2003). How fast is too fast? Evaluating fast forward surrogates for digital video. Proceedings of the ACM/IEEE Joint Conference on Research on Digital Libraries, TX: Houston, May 27-31, 2003, Los Alamitos, CA: IEEE. pp. 221-230.

The authors conducted an experiment comparing the effectiveness of the four speed level of the surrogates extracted from 5 digital videos in OpenVideo by using six measures (corresponding to six performance tasks). 45 subjects were involved. The measures/tasks include object recognition (textual), object recognition (graphical), action recognition, linguistic gist comprehension (full text), linguistic gist comprehension (multiple choice), and visual gist comprehension. The measures are regarded as having face validity with consideration of multiple facets of video browsing behavior conceptually and perceptually. The findings suggest that for video browsing, it is essential to have "a range of user control mechanisms and underlying representations for video."

Wilson, R. & Landoni, M. (2001). Evaluating electronic textbooks: a methodology. Proceedings of 5th European Conference on Digital Libraries, ECDL 2001, Darmstadt, Germany, September 4-9, 2001, 1-12.

The methodology is initially proposed to perform usability tests for the project of EBONI (Electronic Books ON-screen Interface). It sets out options form selecting materials, participants, tasks and techniques, which vary in cost and level of sophistication. No real study was conducted to examine the quality of the proposed methodology.

Wilson, R., Landoni, M., & Gibb, F. (2002). Guidelines for designing electronic books. Research and Advanced Technology for Digital Libraries: Proceedings of the 6th European Conference on Digital Libraries, ECDL'02, September 16-18, 2002, Paris, France, 47-60.

The authors mainly outline the "the guidelines emerging from the EBONI (Electronic Books ON-Screen Interface) Project's evaluations of electronic books." They also provide a summary of their evaluation work under the guidelines of specifically developed "Ebook Evaluation Model," which is outlined in the preceding year's conference. The model essentially contains four selection-related aspects: object (three e-textbooks in psychology differing markedly in appearances), actors (participants <100 students, lectures, and researchers from a range of disciplines in UK higher education>, evaluators, task developers, and task assessors), tasks (finding specific facts and performing post-tests), and evaluation techniques (subjective satisfaction questionnaires, thinking aloud usability sessions, and interviews). There are four categories of E-book design guidelines emerged from the evaluation with respect to adhering to the book metaphor, adapting to the electronic medium, hardware consideration, and accessibility. These guidelines might serve as e-book evaluation criteria.

Yang, S.C. (2001). An interpretive and situated approach to an evaluation of Perseus Digital Libraries. Journal of the American Society for Information Science and Technology, 53 (14): 1210-1223

    In-class observation method was employed to examine how the Perseus Hypermedia Digital Library can help its users approach their regular class assignments.

Zhang, H.J., Yong, L.C., et al. (1995). Video parsing, retrieval and browsing: and integrated and content-based solution. ACM Multimedia 95, Nov.5-9 1995 San Francisco, CA.

    The study compared the effectiveness of human and automatic key frame extraction techniques. Overlap and accuracy were the key measures.

Zhu, B., & Schatz, B. (1999). Support concept-based multimedia information retrieval: a knowledge management approach. Proceeding of the 20th international conference on Information Systems, North Carolina: Charlotte, pp.1-14.

    The study compared the effectiveness of integrated techniques (keyword & concept space-based search). Traditional P/R was the key measure.
 

Compiled by Ying Zhang
Last update 09/26/2004
Comments or Questions? Email To: miceval@scils.rutgers.edu