Moving Image Collection Evaluation

 
Compiled by Ying Zhang

 

Browne, P. & Gurrin, C. (2001). Dublin City University Video Track experiments for TREC 2001, Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html

The authors conducted an experiment to examine users' interaction with three video keyframe browsers for online television programs from the Fischlar system, namely timeline, slide show, and hierarchical. Thirty participants used these browsers to locate the video clip results for 12 general and known TREC topics. Based on an automatic detection of overlap between the result item and the known item, the precision and recall (P/R) were employed to measure effectiveness of interactions. Meanwhile, onscreen questionnaire surveys were used to examine participants' perceived efficiency and subjective preference. The results show P/R values vary between general topics and known item topics. There is no comparison among the three browsers.

Darwish, K., Doermann, D., Jones, R., Oard, D. & Rautiainen, M. (2001). TREC-10 experiments at University of Maryland CLIR and Video. Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html

The TREC-10 Video Track experiment compares the effectiveness of two video shot boundary detection techniques. One is the proposed temporal color correlogram (TCC), whereas another is the HSV color correlogram (CC). For general search topics, the average precision in automatic topic ranks TCC higher. For known item search, the recall results show that the TCC configuration is doing better than the CC is doing.

Eronen, L. & Vuorimaa, P. (2000).User interfaces for digital television: a navigator case study. Proceedings of the Working Conference on Advanced Visual Interfaces, May 2000

Digital television user interfaces are composed of text, graphics and video. Usability issues that arise include information visualization, searching and navigation. This paper introduces two user interface prototypes for digital television. Both prototypes were tested with real users and the test results are discussed.

Goodrum, A.A. (2001). Multidimensional scaling of video surrogates. Journal of the American Society for Information and Technology, 52(2): 174-182.

This experiment study compares four different surrogate display types for moving image collections. Multidimensional scaling was used to map the dimensional dispersions of users' judgments of similarity between videos and similarity between surrogates. "Congruence" among these maps was employed for the evaluation in terms of the representativeness of each surrogate display type. The authors find that overall, congruence for image-based surrogates is greater than those for text-based surrogates.

Hee, M., Ik, Y.Y. & Kim, K.C. (1999). Unified video retrieval system supporting similarity retrieval. Proceedings of the Tenth International Workshop on Database and Expert Systems Applications, 884-888.

The authors propose a hybrid video retrieval system (UVRS) grounded the similarity search simultaneously on annotation-based queries and feature-based (color, spatial, temporal, etc.) queries. Traditional precision and recall are modified to measure the effectiveness of the similarity search. Users were asked to define relevant scenes and objects form the collection. The evaluation results show that the combination of various texture and content features increases both recall and precision.

Huang, J., Umamaheswaran, D. & Palakal, M. (2002). Video indexing and retrieval for Archeological Digital Library, ACM CLIOH. CIVR 2002, 289-298.

The authors propose a new technique of video key frame detection for CLIOH (Cultural Digital Library Indexing our Heritage) that employs wavelet best basis tree structure coupling together with Kohonen SOM neural network. They compare the technique with conventional global feature technique. The results show that the new technology performs better with fewer redundancies and missing frames when being used to extract key frames.

Lippman, A., Butera, W. (1989). Coding image sequences for interactive retrieval. Communications of the ACM, 32(7): 852-860

An image coding technique for digital storage of motion picture information is presented that is optimated for use in interactive systems where high quality still frames, random access, and database linkages are required.

Ma, Y.F., Sheng J., Chen, Y. & Zhang, H.J. (2001). MSR-Asia TREC-10 Video Track: shot boundary detection task. Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html

The TREC-10 Video Track experiment at Microsoft Research Asia employs accuracy and speed as criteria to evaluate the proposed automatic shot boundary detection approach that consists of four functional modules from decoding, feature extraction, inter-frame comparison, and decision. The experiment results based on 42 video sequences indicate that the detection approach is "effective" and much faster than normal speed.

Picard, R.W. (1996). A society of models for video and image libraries. IBM Systems Journal, 35 (3/4)

The article reviews a society of video and image models at the MIT Media Laboratory in terms of their strengths and weaknesses in performing specific texture analysis tasks. Meanwhile, the author introduces two systems employing the multi-model approach. The "society of models approach" provides a system flexibility of choosing the best solution. Unfortunately, the article does not have any evaluation effort regarding how good the proposed approach is.

Picard, R.W., Minka, T.P., & Szummer, M. (1996). Modeling user subjectivity in image libraries, Proceedings of the International Conference on Image Processing, 777-780.

The authors implement a system (FourEyes) that is featured with optimized use of texture and color analysis models and machine learning based on relevance feedback from users' interaction with the system. The evaluation effort is focused on how the "learning mechanism performs on problems it hasn't seen before". Learning time was employed as the performance indicator in terms of "how many examples had to be provided before the problem was solved". The results show that the machine leaner "gains performance on the later problems".

Rui, Y., Gupta, A. & Acero, A. (2000). Automatically extracting highlights for TV baseball programs. Proceedings of ACM Multimedia, 105-115.

An experimental system was implemented to automatically extract highlights from TV baseball programs based on audio features. In order to examine how good the automatic highlight detection system is compared with human highlight detection, the authors devised two effectiveness measures, namely segment overlap and excess-time. On average, three fourth of automatically detected highlight segments are the same as those marked by the human. Meanwhile, "duration of the algorithmically generated segments is not significantly more than that of human segments."

Smeaton, A.F. (2001). Challenges for content-based navigation of digital video in the Fischlar Digital Library. Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html

By using Fischlar Digital Library, a TV broadcast browse/retrieval system in a university campus environment, as an instance, the author demonstrates three groups of content-based techniques for digital video navigation. These techniques include searching, clustering (similarity linking) and summarization. Unfortunately, no evaluation effort was made.

Smeaton, A.F., Over, P. & Taban, R. (2002). The TREC-2001 Video Track report. Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html

The report summarizes initiative effort of TREC-2001 Video Track. The evaluations were done by 12 worldwide institutions with respect to shot boundary detection, and automatic or interactive searching using general or known-item topics. Whereas the pair of precision/recall was used to assess the effectiveness of known-item topic search, only precision was employed form the general topics. For the evaluation of shot boundary detection, in addition to P/R, deletion rate, insertion rate, error rate, quality index, correction probability, correction rate, inserted transition count, and deleted transition count were devised as metrics.

Wildemuth, B.M., Marchionini, G., & Wilkens, T. et al. (2002). Alternative surrogates for video objects in a digital library: users' perspectives on their relative usability. Research and advanced technology for digital libraries: Proceedings of 6th European conference, ECDL'02, Paris, France, September 16-18, 2002, 493-507.

Having acknowledged the feasibility and potential strengths of non-textual surrogates representing digital videos, the authors conducted an exploratory user study to investigate whether there is users' preference on surrogate type, and whether one type of surrogate is better than other surrogates in terms of supporting users' performance. The results show that "though no surrogate triumphed as the "best", the fast forward surrogate garnered substantial support from the study participants, particularly from experienced users." Meanwhile, "the participants were successful in using the surrogates to determine gist, recognize objects and actions they has seen in surrogates, and identify frames that "belonged" in a particular video."

Zhang, H.J., Low, C.Y., Smolia, S.W. & Wu, J.H. (1995). Video Parsing, retrieval and browsing: an integrated and content-based solution. Proceedings of ACM Multimedia, November 5-9 San Francisco, California.

The authors propose an approach for digital video parsing, browsing and retrieval that based on integrated content information. The approach contains three levels of process namely temporal segmentation, key frame abstraction, and representation of key frame content representation. The experimental results for the effectiveness (measured by number of shots correctly detected, number of shots missed, false detection, and number of key-frames extracted) of the key frame abstraction show that the automatically extraction system did not miss any of the key frames identified by the SBC Film/Videotape Library staff. Moreover, the automatic extraction was more objective according to the library staff's perception.

 

Compiled by Ying Zhang
Last update 01/03/2004
Comments or Questions? Email To: miceval@scils.rutgers.edu