|
|
Browne, P. & Gurrin, C. (2001). Dublin City University Video
Track experiments for TREC 2001, Retrieved April 03, 2003, from
http://trec.nist.gov/pubs/trec10/t10_proceedings.html
The authors conducted an experiment to examine users' interaction
with three video keyframe browsers for online television programs
from the Fischlar system, namely timeline, slide show, and hierarchical.
Thirty participants used these browsers to locate the video clip
results for 12 general and known TREC topics. Based on an automatic
detection of overlap between the result item and the known item,
the precision and recall (P/R) were employed to measure effectiveness
of interactions. Meanwhile, onscreen questionnaire surveys were
used to examine participants' perceived efficiency and subjective
preference. The results show P/R values vary between general topics
and known item topics. There is no comparison among the three
browsers.
Darwish, K., Doermann, D., Jones, R., Oard, D. & Rautiainen,
M. (2001). TREC-10 experiments at University of Maryland CLIR and
Video. Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html
The TREC-10 Video Track experiment compares the effectiveness
of two video shot boundary detection techniques. One is the proposed
temporal color correlogram (TCC), whereas another is the HSV color
correlogram (CC). For general search topics, the average precision
in automatic topic ranks TCC higher. For known item search, the
recall results show that the TCC configuration is doing better
than the CC is doing.
Eronen, L. & Vuorimaa, P. (2000).User interfaces for digital
television: a navigator case study. Proceedings of the Working
Conference on Advanced Visual Interfaces, May 2000
Digital television user interfaces are composed of text, graphics
and video. Usability issues that arise include information visualization,
searching and navigation. This paper introduces two user interface
prototypes for digital television. Both prototypes were tested
with real users and the test results are discussed.
Goodrum, A.A. (2001). Multidimensional scaling of video surrogates.
Journal of the American Society for Information and Technology,
52(2): 174-182.
This experiment study compares four different surrogate display
types for moving image collections. Multidimensional scaling was
used to map the dimensional dispersions of users' judgments of
similarity between videos and similarity between surrogates. "Congruence"
among these maps was employed for the evaluation in terms of the
representativeness of each surrogate display type. The authors
find that overall, congruence for image-based surrogates is greater
than those for text-based surrogates.
Hee, M., Ik, Y.Y. & Kim, K.C. (1999). Unified video retrieval
system supporting similarity retrieval. Proceedings of the Tenth
International Workshop on Database and Expert Systems Applications,
884-888.
The authors propose a hybrid video retrieval system (UVRS) grounded
the similarity search simultaneously on annotation-based queries
and feature-based (color, spatial, temporal, etc.) queries. Traditional
precision and recall are modified to measure the effectiveness
of the similarity search. Users were asked to define relevant
scenes and objects form the collection. The evaluation results
show that the combination of various texture and content features
increases both recall and precision.
Huang, J., Umamaheswaran, D. & Palakal, M. (2002). Video indexing
and retrieval for Archeological Digital Library, ACM CLIOH. CIVR
2002, 289-298.
The authors propose a new technique of video key frame detection
for CLIOH (Cultural Digital Library Indexing our Heritage) that
employs wavelet best basis tree structure coupling together with
Kohonen SOM neural network. They compare the technique with conventional
global feature technique. The results show that the new technology
performs better with fewer redundancies and missing frames when
being used to extract key frames.
Lippman, A., Butera, W. (1989). Coding image sequences for interactive
retrieval. Communications of the ACM, 32(7): 852-860
An image coding technique for digital storage of motion picture
information is presented that is optimated for use in interactive
systems where high quality still frames, random access, and database
linkages are required.
Ma, Y.F., Sheng J., Chen, Y. & Zhang, H.J. (2001). MSR-Asia
TREC-10 Video Track: shot boundary detection task. Retrieved April
03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html
The TREC-10 Video Track experiment at Microsoft Research Asia
employs accuracy and speed as criteria to evaluate the proposed
automatic shot boundary detection approach that consists of four
functional modules from decoding, feature extraction, inter-frame
comparison, and decision. The experiment results based on 42 video
sequences indicate that the detection approach is "effective"
and much faster than normal speed.
Picard, R.W. (1996). A society of models for video and image
libraries. IBM Systems Journal, 35 (3/4)
The article reviews a society of video and image models at the
MIT Media Laboratory in terms of their strengths and weaknesses
in performing specific texture analysis tasks. Meanwhile, the
author introduces two systems employing the multi-model approach.
The "society of models approach" provides a system flexibility
of choosing the best solution. Unfortunately, the article does
not have any evaluation effort regarding how good the proposed
approach is.
Picard, R.W., Minka, T.P., & Szummer, M. (1996). Modeling user
subjectivity in image libraries, Proceedings of the International
Conference on Image Processing, 777-780.
The authors implement a system (FourEyes) that is featured with
optimized use of texture and color analysis models and machine
learning based on relevance feedback from users' interaction with
the system. The evaluation effort is focused on how the "learning
mechanism performs on problems it hasn't seen before". Learning
time was employed as the performance indicator in terms of "how
many examples had to be provided before the problem was solved".
The results show that the machine leaner "gains performance on
the later problems".
Rui, Y., Gupta, A. & Acero, A. (2000). Automatically extracting
highlights for TV baseball programs. Proceedings of ACM Multimedia,
105-115.
An experimental system was implemented to automatically extract
highlights from TV baseball programs based on audio features.
In order to examine how good the automatic highlight detection
system is compared with human highlight detection, the authors
devised two effectiveness measures, namely segment overlap and
excess-time. On average, three fourth of automatically detected
highlight segments are the same as those marked by the human.
Meanwhile, "duration of the algorithmically generated segments
is not significantly more than that of human segments."
Smeaton, A.F. (2001). Challenges for content-based navigation
of digital video in the Fischlar Digital Library. Retrieved April
03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html
By using Fischlar Digital Library, a TV broadcast browse/retrieval
system in a university campus environment, as an instance, the
author demonstrates three groups of content-based techniques for
digital video navigation. These techniques include searching,
clustering (similarity linking) and summarization. Unfortunately,
no evaluation effort was made.
Smeaton, A.F., Over, P. & Taban, R. (2002). The TREC-2001
Video Track report. Retrieved April 03, 2003, from http://trec.nist.gov/pubs/trec10/t10_proceedings.html
The report summarizes initiative effort of TREC-2001 Video Track.
The evaluations were done by 12 worldwide institutions with respect
to shot boundary detection, and automatic or interactive searching
using general or known-item topics. Whereas the pair of precision/recall
was used to assess the effectiveness of known-item topic search,
only precision was employed form the general topics. For the evaluation
of shot boundary detection, in addition to P/R, deletion rate,
insertion rate, error rate, quality index, correction probability,
correction rate, inserted transition count, and deleted transition
count were devised as metrics.
Wildemuth, B.M., Marchionini, G., & Wilkens, T. et al. (2002).
Alternative surrogates for video objects in a digital library: users'
perspectives on their relative usability. Research and advanced
technology for digital libraries: Proceedings of 6th European conference,
ECDL'02, Paris, France, September 16-18, 2002, 493-507.
Having acknowledged the feasibility and potential strengths of
non-textual surrogates representing digital videos, the authors
conducted an exploratory user study to investigate whether there
is users' preference on surrogate type, and whether one type of
surrogate is better than other surrogates in terms of supporting
users' performance. The results show that "though no surrogate
triumphed as the "best", the fast forward surrogate
garnered substantial support from the study participants, particularly
from experienced users." Meanwhile, "the participants
were successful in using the surrogates to determine gist, recognize
objects and actions they has seen in surrogates, and identify
frames that "belonged" in a particular video."
Zhang, H.J., Low, C.Y., Smolia, S.W. & Wu, J.H. (1995). Video
Parsing, retrieval and browsing: an integrated and content-based
solution. Proceedings of ACM Multimedia, November 5-9 San
Francisco, California.
The authors propose an approach for digital video parsing, browsing
and retrieval that based on integrated content information. The
approach contains three levels of process namely temporal segmentation,
key frame abstraction, and representation of key frame content
representation. The experimental results for the effectiveness
(measured by number of shots correctly detected, number of shots
missed, false detection, and number of key-frames extracted) of
the key frame abstraction show that the automatically extraction
system did not miss any of the key frames identified by the SBC
Film/Videotape Library staff. Moreover, the automatic extraction
was more objective according to the library staff's perception.
|
|