Information Retrieval Design, a book by James D. Anderson and Jose Perez-Carballo
How to obtain a full version of this book Find out about free shipping offer


[Brief table of Contents] [Previous Chapter] [Next Chapter] [book index]

Chapter 13. Vocabulary Management

Contents of Chapter 13

13.1. The Vocabulary Problem.
13.2. Research on Vocabulary Issues.
13.3. Vocabulary Solutions.
13.3.1. Syndetic Structure in Displayed Alphabetical Indexes.
13.3.2. Indexing Thesauri.
13.3.2.1 Examples of Indexing Thesauri.
13.3.3. End-User Thesauri.
13.3.3.1. Compiling an End-User Thesaurus.
13.3.3.1.1. Sources of Terms.
13.3.3.1.2. Selecting Terms.
13.3.3.1.3. Categorizing Terms.
13.3.3.1.4. Bound Terms Versus Elemental Descriptors.
13.3.3.1.5. Term Relationships.
13.3.3.1.6. Variant Forms and Equivalent Terms.
13.3.3.1.7. Homographs.
13.3.3.1.8. Thesaurus Displays.
13.3.4. Co-Occurrence Term Clustering.
13.3.5. Ontologies.
13.4. Our Examples.
13.4.1. A Book Index.
13.4.2. An Indexing and Abstracting Service.
13.4.3. A Full-Text Encyclopedia/Digital Library.


13.1. The Vocabulary Problem.

1

2 richness of human language

3 categories of information needs and desires

4 searches for known items with known vocabulary

5 searches for known items with unknown vocabulary

6

7

8 searches for unknown items with known vocabulary

9 searches for unknown items with unknown vocabulary

10 searches for unknown items with unknown vocabulary and vague concepts

11 searches of exploration

12 continua of information seeking situations

13.2. Research on Vocabulary Issues.

13 research on information seeking; views of Belkin (Nicholas J.) on anomalous states of knowledge

14 views of Furnas (George W. et al.) on variability of vocabulary

15 vocabulary of users compared to Library of Congress subject headings

16 views of Bates (Marcia) on variability of vocabulary

17 variability of vocabulary among searchers and indexers

18

19 variability of vocabulary in full-text sources

13.3. Vocabulary Solutions.

20

21 research on solutions for vocabulary problems

22 views of Bates (Marcia) on variability of vocabulary

23 experimental research on end-user thesauri

24 field research on use of thesauri

25

26 integration of thesauri with search interfaces

27 combining thesauri and co-occurrence lists

28 mapping of search terms to controlled vocabulary

29 interaction with multiple controlled vocabularies

30 display of thesauri for searching

31 work of Pollitt (A. Steven, et al.) on display of thesauri for searching

32 facets in EMTREE thesaurus

33 dynamic postings in faceted relational classified displayed indexes

34

35

36

37

38

39

13.3.1. Syndetic Structure in Displayed Alphabetical Indexes.

40 definition of syndetic structure

41 role of syndetic structure

42 subject headings versus terms in syndetic structure

43 types of syndetic structure; types of cross references

44 equivalent-term cross references

45 see-also references

46 narrower-term cross references; related-term cross references

47 omission of see-also references

48 purpose of syndetic structure

49 cross references in OPACs

50 postings data in cross references

51 cross references and syndetic structure in thesauri

52 UF as notation for un-used terms

53

54 UF as instruction for creation of equivalent-term cross references

55 form of equivalent-term cross references in OPACs

56 NT as notation for narrower terms; narrower-term cross references

narrower terms versus related terms in syndetic structure, in thesauri :57

58

59 translation of notation for thesauri into natural human language

60

61 BT as notation for broader terms; broader-term cross references

62

63

64 RT as notation for related terms; related-term cross references

65

66

67 general see-also references

68 cross references in library catalogs

69

70

71

72

73 omission of cross references in OPACs

74 impact of omission of cross references

75 proposal for research on syndetic structure

76

77

13.3.2. Indexing Thesauri.

78

79

80 source of term "thesaurus"

81 books on construction of thesauri

82 thesauri for full-text IR databases

83 views of Soergel (Dagobert) on construction of thesauri

84 card format for term records for thesauri

Soergel's Thesaurus Record Card

Notes for the record card:

85 computer programs for construction of thesauri

13.3.2.1 Examples of Indexing Thesauri.

86

87 Unesco thesaurus (1977)

88 term records in Unesco thesaurus (1977)

89 classification notation in Unesco thesaurus (1977)

90 notation in Unesco thesaurus (1977)

91 KWIC display in Unesco thesaurus (1977)

92 hierarchical displays in Unesco thesaurus (1977)

93

94

95 relational displays in Unesco thesaurus (1977)

96

97

98 Unesco thesaurus (1995)

99 microthesauri in Unesco thesaurus (1995)

100

101

102 display of multiple hierarchical levels in Unesco thesaurus (1995)

103 term records in Unesco thesaurus (1995)

104 Eurovoc thesaurus

105 term records in Eurovoc thesaurus

106

107

108 microthesauri in Eurovoc thesaurus

109

110

111 ASIS thesaurus

112 display of ASIS thesaurus

113 facets in ASIS thesaurus

114

115

116

117

118

119

13.3.3. End-User Thesauri.

120 end-user thesauri versus indexing thesauri

121 differences between indexing thesauri versus end-user thesauri

122 lead-in terms in end-user thesauri

123 gathering terms in end-user thesauri

124 examples of end-user thesauri

13.3.3.1. Compiling an End-User Thesaurus.

13.3.3.1.1. Sources of Terms.

125

126

127

128

129 procedures for compilation of end-user thesauri

13.3.3.1.2. Selecting Terms.

130

131 search statements as source of terms for end-user thesauri

132 views of Landauer (Thomas K.) on users as source of terms for end-user thesauri

133

134

135

136

137

138

139

140

141 selection of terms from texts for end-user thesauri

142

143 identification of phrases from full text for end-user thesauri

144

145

documentary unit. A "documentary unit" is the portion of a document that can be directly retrieved by an IR database. Documentary units may be complete documents, such as complete books, or complete periodical articles. Or they may be parts of complete documents — chapters in books, or paragraphs or charts or diagrams or illustrations in periodical articles. This same variety in the size of documentary units applies to all media. An IR database for videotapes, for example, might retrieve only complete videotapes (so that the documentary unit is the complete tape), or it might be able to retrieve individual frames or short sequences of frames, in which cases, either the individual frames, or the short sequences of frames, constitute the documentary units. In all cases, the documentary unit is the unit that is analyzed for indexing (either by machine algorithm or by human inspection). Consequently, the "documentary unit" is also called the "unit of analysis." "Bibliographic unit" has also been used for this concept, indicating the unit described and retrievable via a bibliography. Small documentary units have also been called "information units," but one should hope that all documentary units will be informative!

146 phrases from full text for end-user thesauri

147

148 indexers as source of terms for end-user thesauri

13.3.3.1.3. Categorizing Terms.

149

150

151

152 stop list terms in end-user thesauri

153

154 sorting of terms for end-user thesauri

155 facets for end-user thesauri

156 primary facets for end-user thesauri

157 term records for end-user thesauri

158 field tags for term records

159 initial categorization of terms for thesauri

160 size of categories in thesauri

161 categories of entities in end-user thesauri

162

163 categories of operations and processes in end-user thesauri

164 definition of categories in thesauri

165

166 categories in thesauri not mutually exclusive

167 merger of conceptually similar term records

168 sorting of terms in end-user thesauri

13.3.3.1.4. Bound Terms Versus Elemental Descriptors.

169

170 views of standards for thesauri on bound terms

171

172

173

174 impact of bound terms on size of thesauri

13.3.3.1.5. Term Relationships.

175

176 term relationships in thesauri

177 examples of term relationships in thesauri

178 equivalence relationships in thesauri

179 hierarchical relationships in thesauri

180 associative relationships in thesauri

181 more detailed term relationships in thesauri

182 views of Farradane (Jason) on term relationships

183 views of Diener (Richard) on term relationships

184 views of Wang, Vandendorpe, and Evens on term relationships

185 views of ALA ALCTS Subject Analysis Committee on term relationships

186 compilation of term relationships by Michel (Dee) and Kuhr (Pat)

187 research on term relationships in thesauri

188 attitudes of users toward term relationships in thesauri

189 hierarchical relationships versus associative relationships in thesauri

190 term relationships in hierarchical displays in thesauri

191 display of term relationships in thesauri

192 term relationships during compilation of thesauri

193

194 attitudes of users toward term relationships in thesauri

195 hierarchical relationships versus associative relationships in thesauri

196 views of Cutter (Charles Ammi) on role of principles in cataloging

197

198

199

200

13.3.3.1.6. Variant Forms and Equivalent Terms.

201

202 gathering terms in end-user thesauri

203 gathering terms versus preferred terms in thesauri

204 choice of gathering terms in end-user thesauri; choice of preferred terms in indexing thesauri

205 equivalent terms versus variant terms in end-user thesauri

206

207 cross references in hypertext

208

209

210 used for terms versus equivalent terms in end-user thesauri

13.3.3.1.7. Homographs.

211

13.3.3.1.8. Thesaurus Displays.

212 search options in end-user thesauri

213 browsable indexes for end-user thesauri

214 relational displays in end-user thesauri

215 searching with end-user thesauri

216

217

13.3.4. Co-Occurrence Term Clustering.

218

219 research on clustering of terms for vocabulary management

220 clustering terms for vocabulary management

13.3.5. Ontologies.

221 definitions of ontologies

222

223

224

"1. <philosophy> A systematic account of Existence.

"2. <artificial intelligence> (From philosophy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.

"For AI systems, what "exists" is that which can be represented. When the knowledge about a domain is represented in a declarative language, the set of objects that can be represented is called the universe of discourse. We can describe the ontology of a program by defining a set of representational terms. Definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.

"A set of agents that share the same ontology will be able to communicate about a domain of discourse without necessarily operating on a globally shared theory. We say that an agent commits to an ontology if its observable actions are consistent with the definitions in the ontology. The idea of ontological commitment is based on the Knowledge-Level perspective.

"3. <information science> The hierarchical structuring of knowledge about things by subcategorising them according to their essential (or at least relevant and/or cognitive) qualities. See subject index. This is an extension of the previous senses of "ontology" (above) which has become common in discussions about the difficulty of maintaining subject indices" (1997-04-09).

225 ontologies versus thesauri

226 views of Hjerppe (Roland) on ontologies versus knowledge organization systems

227 views of Sowa (John) on categories in ontologies

228 views of Poli (Roberto) on categories in ontologies

229

230 categories and term relationships in thesauri versus ontologies

231 weak structures in ontologies

232 views of Vickery (Brian C.) on ontologies

233 ontologies for machine translation; conceptual levels in ontologies

234 ontologies for business

235 compilation of ontologies

236 views of Vickery (Brian C.) on ontologies

13.4. Our Examples.

13.4.1. A Book Index.

237

238 vocabulary management for book indexes in print media

239 integration of vocabulary management in book indexes

240 equivalent-term cross references for synonymous and equivalent terms in book indexes

241 double posting for synonymous and equivalent terms in book indexes

242

243 equivalent-term cross references for narrower terms in book indexes

244 terminology in equivalent-term cross references

245 see-also references in book indexes

246 application of thesauri to book indexes

247

248

249

250 vocabulary management for indexes in electronic books

251 presentation of see-also references in displayed indexes in electronic media

252 non-displayed indexes for electronic books

253 presentation of suggestions for vocabulary management for searches in non-displayed indexes

254

255

256

13.4.2. An Indexing and Abstracting Service.

257

258 vocabulary management for indexing and abstracting services in print media

259 see-also references for equivalent terms in automatic indexing

260

261

262 vocabulary management for non-displayed indexes for       
indexing and abstracting services in electronic media

263 suggestions for vocabulary management for multiple terms in search statements

264 optional status of suggestions for vocabulary management

265

13.4.3. A Full-Text Encyclopedia.

266

267


perez-carballo@acm.org Last modified: Tue Jun 6 18:02:09 CDT 2006

Valid HTML 4.1!