Information Retrieval Design, a book by James D. Anderson and Jose Perez-Carballo
How to obtain a full version of this book Find out about free shipping offer


[Brief table of Contents] [Previous Chapter] [Next Chapter] [book index]

Chapter 8. Analysis and Indexing Methods

Contents of Chapter 8

8.1. Research Comparing Automatic and Human Indexing.
8.2. Human Analysis and Indexing.
8.2.1. Cognition Versus Social Construction in Human Analysis and Indexing
8.2.2. Human Indexing Rules.
8.2.2.1. Human Indexing Rules for Image Text.
8.2.2.2. Human Indexing Rules Based on Probabilistic Analysis.
8.3. Automatic Analysis and Indexing.
8.3.1. In the Beginning Was the Word.
8.3.2. Simple Keyword Indexing.
8.3.3. Negative Vocabulary Control: Stop Lists.
8.3.4. Counting Words.
8.3.5. Comparative Counting and Weighting.
8.3.6. Improving the Count: Stemming.
8.3.7. Natural Word Distributions.
8.3.8. Words Versus Phrases.
8.3.9. Managing Vocabulary in Automatic Indexing.
8.3.10. Automatic Vocabulary Management.
8.3.11. Clustering.
8.3.11.1. Latent Semantic Indexing.
8.3.12. Citation Indexes.
8.3.12.1. Bibliographic Coupling.
8.3.12.2. Co-Citation.
8.3.13. Relevance Feedback.
8.4. Subject Analysis and Indexing in Indexing and Abstracting Services.
8.5. Growing Role of Automatic Analysis and Indexing.
8.5.1. Censorship or Guidance?
8.6. Our Examples.
8.6.1. A book Index.
8.6.2. An Indexing and Abstracting Service.
8.6.3. A Full-Text Encyclopedia/Digital Library.


1

2 human indexing versus automatic indexing

3 results of human indexing versus automatic indexing

4 multiple approaches to indexing in IR databases

5 automatic indexing of language texts versus image texts and other non-language texts

6 recommended resources on indexing processes

8.1. Research Comparing Automatic and Human Indexing.

7

8 role of users in IR research

9 variables in IR research

10 size of documentary units among variables in IR research

11 extent of indexable matter among variables in IR research

12 exhaustivity among variables in IR research

12a specificity among variables in IR research

13 browsability among variables in IR research

14 syntax among variables in IR research

15

16 vocabulary management among variables in IR research

17 surrogation among variables in IR research

18 conflation of variables in IR research

19 views of Cooper (William S.) on variables in IR research

20 conflation of variables in IR research

21 role of users in IR research at TREC

22 evidence from use of automatic indexing versus human indexing

23 user preferences for automatic indexing versus human indexing

24 effectiveness of automatic indexing

25 cost-benefit analysis of human indexing versus automatic indexing

26

8.2. Human Analysis for Indexing.

27 methods of human analysis for human indexing

28 cognitive processes in human indexing

29

30 role of documentary features in human indexing

31 cognitive processes in human indexing

32 analysis steps in human indexing

33 views of Mulvany (Nancy) on human indexing

34

35

36

37 cultural factors in human indexing versus automatic indexing

38 cultural factors in automatic indexing

39 views of Chan (Lois Mai) on human indexing

40

41

42 views of Chicago manual of style on human indexing

43 views of Fugmann (Robert) on human indexing

44 views of Soergel (Dagobert) on human indexing

45 views of Lancaster (F. W.) on human indexing

46 views of Fairthorne (Robert) on human indexing

47 views of O'Connor (Brian) on human indexing

48 views of Wellisch (Hans) on human indexing

49

50

51 views of Wilson (Patrick) on human indexing

52

53

54

55 concrete entity and event databases versus IR databases

56 views of Taylor (Arlene) on human indexing

57 views of Hjørland (Birger) on human indexing

58 activity theory: treatment of knowledge organization

59 paradigms of information science

60 role of domain analysis in information understanding

61 views of Hjørland (Birger) on nature of subjects

62

63 variability in human indexing

64 consistency in human indexing

65

66 inconsistency in searching

8.2.1. Cognition Versus Social Construction in Human Analysis and Indexing.

67 views of Frohmann (Bernd) on human indexing

68

69 views of Foskett (A. C.) on human indexing

70 views of Farradane (Jason) on human indexing

71

72 views of Beghtol (Clare) on human indexing

73 views of Anderson (James D.) on human indexing

74

75 views of Artandi (Susan) on human indexing

76 human indexing as model for automatic indexing

77 positive attributes of human indexing

78 application of views of Wittgenstein (Ludwig) to human indexing

79

80 application of views of Wittgenstein (Ludwig) to social construction of indexing rules

81 queer theory compared to indexing theory

82 queer theory

83 essentialism versus social constructionism in gender studies

84

85 role of gender in human indexing

86 social construction of gender

87 culture versus cognition in human indexing

88 views of Frohmann (Bernd) on social context of human indexing

89

8.2.2. Human Indexing Rules.

90 human indexing as two step process

91 rules for analysis in human indexing

92 standards for analysis in human indexing: British and international

93 guidelines for analysis in cataloging and classification at Rutgers University

94 subjective nature of guidelines for indexing

95 views of Hjørland (Birger) on guidelines for indexing

96 relation of subject scope and documentary scope to rules for human indexing

97 specialized rules for human indexing

98 rules for indexing for MLA international bibliography

99 rules for indexing about diesel engines by Ranganathan

100 role of specialized categories in human indexing

101 limitations of rules for human indexing

102 qualitative judgments in request-oriented human indexing

103 views of Frohmann (Bernd) on rules for human indexing

104 purposes of information retrieval for diverse users

105 domain analysis as basis for rules for human indexing

106 wants versus needs in information retrieval

107 political aspects of information retrieval

108 identification of non-topical features in human indexing;
bibliographic coupling and co-citation as basis for indexing

109

8.2.2.1. Human Indexing Rules for Image Text.

110 views of Jorgensen (Corinne) on indexing of image texts

111 views of Pérez-López (Kathleen Golitko) on automatic indexing of image texts

112 recommended resources on human indexing of image texts

113 terminology for image texts and sound texts

8.2.2.2. Human Indexing Rules Based on Probabilistic Analysis.

114 views of Frohmann (Bernd) on rules for human indexing of Cooper (William S.)

115 views of Cooper (William S.) on human indexing

116 decision theory, utility theory, and gedanken experimentation in rules for human indexing

117

118

119 odds-payoff indexing chart

Figure 8.1. Cooper's odds-payoff indexing chart

"Possible format for a graphic aid to gedanken indexers.
The data are fictitious" (Cooper 1978, p. 117)

120

121

122

123

124 numerical values for decision making in human indexing

8.3. Automatic Indexing.

125 automatic indexing versus human searching

126 automatic indexing of language texts versus image texts and sound texts

127 indexing of image texts by Altavista web search engine

128 theoretical models for automatic indexing: vector-space model, probabilistic model

129 language model for automatic indexing

130 recommended resources on automatic indexing

8.3.1. In the Beginning Was the Word.

131 definitions of words in automatic indexing

132 definitions of words in Chinese language

133 treatment of punctuation in automatic indexing

134 treatment of hyphens in automatic indexing

135 treatment of slashes in automatic indexing

136 treatment of underscores and full stops (periods) in automatic indexing

137 treatment of parentheses in automatic indexing

138 treatment of apostrophes in automatic indexing

139 treatment of numbers in automatic indexing

140

141

142 treatment of single characters in automatic indexing

143 definition of words in automatic indexing

144 treatment of upper- and lower-case letters in automatic indexing

8.3.2. Simple Keyword Indexing.

145

8.3.3. Negative Vocabulary Control: Stop Lists.

146 stop lists for reducing size of indexes

147 choice of words for stop lists

148 number of words in stop lists

149 negative vocabulary control

8.3.4. Counting Words.

150 use of frequency of words for ranking texts

8.3.5. Comparative Counting and Weighting.

151 inverse document frequency of words

152 calculation of document weights

153

154

8.3.6. Improving the Count: Stemming.

155 impact of stemming on frequency of words

156 identification of word roots in stemming

157 stemming of plural "s" suffixes

158 stemming of multiple suffixes

159 impact of stemming

8.3.7. Natural Word Distributions.

160 Zipf's law on distributions of words in texts

161 identification of keywords based on transition points in Zipfian distributions

162 automatic indexing compared to human indexing

163 Zipfian distribution of words in article by Booth (A. D.)

164 transition point in Zipfian distribution of words

165 identification of keywords based on transition points in Zipfian distributions

166 effectiveness of keywords

167 keywords based on Zipfian distributions compared to human indexing

168 incompatibility of human indexing compared to automatic indexing

169 automatic indexing compared to human indexing

170

171

172

173

174

175

176

177

8.3.8. Words Versus Phrases.

178 importance of phrases in automatic indexing

179 proper nouns in indexing

180 cost versus benefits in identification of phrases in automatic indexing

181 identification of phrases in automatic indexing and in searching

182

183

184

185 identification of phrases in automatic indexing

186 role of phrases in browsing

187

188

189

8.3.9. Managing Vocabulary in Automatic Indexing.

190

191 positive vocabulary management in automatic indexing

192 vocabulary management of equivalent and synonymous terms

193 vocabulary management of minor terms

194 vocabulary management in automatic indexing

195 vocabulary management for displayed indexes

196 vocabulary management for electronic searching

197 addition of terms to thesauri in automatic indexing

198 bypassing vocabulary management in electronic searching

8.3.10. Automatic Vocabulary Management.

199

200 Associative Interactive Dictionary as example of automatic vocabulary management

201 identification of related terms by co-occurrence

202 ranking of related terms by frequency of co-occurrence

203

204

205

206

207

208 impact of automatic vocabulary management

8.3.11. Clustering.

209 definitions of classing and clustering

210 criteria for clusters

211 clusters in searching

212 document similarity as basis for clustering

213 types of clusters: string clusters

214 star clusters

215 clique clusters

216 clump clusters

Figure 8.2. Types of clusters, based on Salton (1975a).


217 thresholds in automatic clustering

218 automatic clustering techniques: static clustering, dynamic clustering, scatter-gather clustering

219 static clustering

220

8.3.11.1. Latent Semantic Indexing.

221

222 vocabulary management in latent semantic indexing

8.3.12. Citation Indexes.

223 citation links to older documents

224 citation indexes to newer documents

8.3.12.1. Bibliographic Coupling.

225 definition of bibliographic coupling

226 bibliographic coupling compared to co-citation

8.3.12.2. Co-Citation.

227 definition of co-citation; identification of research fronts by co-citation

8.3.13. Relevance Feedback.

228 feedback in automatic indexing and in searching

229 purpose of relevance feedback

230

231 procedures in relevance feedback

232

233 relevance feedback in selective dissemination of information and filtering

234 role of human searching behavior in automatic indexing

235 pseudo relevance feedback

8.4. Subject Analysis and Indexing in Indexing and Abstracting Services.

236

237

238 MedIndEx as example of expert system for subject analysis and indexing

239 use of checktags in subject analysis and indexing

240 computer-aided subject analysis and indexing for indexing and abstracting services

8.5. Growing Role of Automatic Analysis and Indexing.

241 allocation of automatic indexing versus human indexing

242 allocation of human indexing to important documents

243

244 use of human indexing for identification of useful documents

245 views of Bates (Marcia J.) on role of human indexing

246

247 criteria for allocation of human indexing

8.5.1. Censorship or Guidance?

248 measures of use versus censorship

249 expert judgment versus use in evaluation of importance

250 selection of useful documents by advisory groups and indexing staff

251 expert judgment versus user preferences in IR database design

252 expert judgment in indexing

253 role of human indexers in assessments of authority

254 identification of contributions by human indexers

255 discovery of controversial documents

256 inequality of documents

257 application of expert judgment to world-wide web and internet

258 machines versus humans in indexing

8.6. Our Examples.

8.6.1. A Book Index.

259

8.6.2. An Indexing and Abstracting Service.

260

8.6.3. A Full-Text Encyclopedia/Digital Library.

261


perez-carballo@acm.org Last modified: Tue Jun 6 18:02:09 CDT 2006

Valid HTML 4.1!