FIS 2144: Subject Approach to Information
Winter 2007

 
Instructor: Jens-Erik Mai
Phone: 416.978.7097
Office: BL 641
Office hours: By appointment
Class meets: Wednesday 9:00pm-12:00noon in BL 313

[About] [Study Tips] [Schedule] [Assignments] [Advisory Groups] [Resources] [Readings]


The purpose and content of this course is in FIS's course catalog described as: “Knowledge organization with respect to subject analysis and retrieval. Theories and principles. Design, application, and evaluation of various methods for accessing documents and information, using controlled vocabularies and natural language (thesauri, hypertext, expert systems). Emphasis on computerized systems including the internet.”

After years, even decades, of focus on automating the storage, organization, and retrieval of documents, information, knowledge there is now a growing demand for people who know how to design and build controlled vocabularies and manually index documents using such controlled vocabularies.  The aim of this course is to prepare you to lead a team that can construct a controlled vocabulary and develop policies and practices for representing information.  The course will provide you with knowledge about the fundamental theories and principles for representation of information and design of controlled vocabularies.

Representation of the content, topic, subject matter, aboutness of documents is central to the success of any information retrieval system.  This course surveys different approaches, techniques, and methods for representing the subject matter of documents - with a bias towards manual methods.  A main focus in the course is design and construction of controlled vocabularies.  There are many other types of controlled vocabularies and many terms for the same types of controlled vocabularies.  While we will not go into detail about these difference we will briefly explore these and focus our attention on the thesaurus.  The thesaurus is the most complex and advanced type of controlled vocabulary and by mastering the design of thesauri, you will gain a good grasp on other types of controlled vocabularies.

This course will prepare you to take up the challenges of designing controlled vocabularies by gaining practical experience in constructing a thesaurus.  Much of the course is centered around a team project in which you will analyze a domain (of your own choice) and construct a small thesaurus for it.  For this part of the course, I will not go over the readings or lecture in class instead you should discuss the readings with your teammates and use the readings to guide you through the construction process.  I will meet with the teams each week and we can discuss the readings then if desired.

The objective of the course is to give a solid understanding of the principles, issues, and problems in subject representation and design and construction of controlled vocabularies. More specifically the students will:

  • be exposed to trends in theories of subject representation

  • learn about how to represent documents

  • become knowledgeable about different methods for subject representation

  • learn about how to develop indexing policies locally

  • learn how to design, construct, and maintain a thesaurus using standard manual methods

  • be able to plan and implement a project to design and construct a controlled vocabulary.

 Please also see the general notes on my classes. [link]


Study Tips

The outcome of this course depends to a large degree on your active participation. This means that it is important that you have read and understood the assigned texts for each class. You need to read the texts in such a manner that you can ask and answer questions about them. In other words, you control the outcome of this course. It is your responsibility to do what is necessary to understand the texts, e.g. read the texts multiple times, write an abstract or outline of the texts, participate in study groups.

I urge you to use a critical approach when reading the texts. This means that for each text you should think about what the main point of the text is. What is the author’s message? What is it that he/she wants to convey? Do you agree with the author? Why/why not? Place the text in a context. Consider for example how the text/author--consciously or unconsciously--relates to other texts in this course and other courses. How does it relate to other discussions we have had in this course? Etc., etc.

Always remember to ask and answer the most important question when facing scientific and scholarly literature: SO WHAT? This is a simple way to evaluate whether the text makes a difference to you, and whether it brings the field a step further.

Ideally, you should read all assigned readings for each class meeting. I have provided two mechanisms to guide you and help you prioritize your reading:
1) I have listed all the readings in a suggested reading order.  This order reflects a progression through the issues that I think will be helpful.
2) I have classified each reading as basic, recommended, or optional. 
Basic (B) material should be mastered by every student, recommended (R) material should be mastered by students seeking a good knowledge of the subject, and optional (O) material should be mastered by students with special interest and aptitudes.  You should note that sometimes the full, best, or most comprehensive understanding of a basic reading may not come before reading an optional reading.

It is your choice which readings to read if you cannot read all of them. However, most of our discussions, exercises and assignments will assume that you've read and understood all the material through the recommended level. 


Class Schedule

Date     Theme   Readings
Wednesday
Jan. 10
 
  Introduction to the course
Wednesday
Jan. 17
 
   

Aboutness and subject analysis
 

  Frohmann, 1990. (R)
Hjorland, 1992. (R)
Wilson. 1968. Intro (B)
Wilson. 1968. Chap 1 (O)
Wilson. 1968. Chap 2 (O)
Wilson. 1968. Chap 3 (O)
Wilson. 1968. Chap 5 (R)
Wilson. 1968. Chap 6 (R)
 
Wednesday
Jan. 24
 
    Subject indexing, subject access, subject analysis
 
  ISO. 1985. (B)
Hjørland. 1997.
(R)
Bates. 1998. (O)
Mai. 2001.
(O)
Fidel. 1994.
(B)
Mai. 2005. (O)

Swift. 1979.
(R)
Brown et. al. 1996. (O)
Feinberg. 2006. (R)

 
Wednesday
Jan. 31
 
   

Subject languages

Assignment 1 is due

  Rosenfeld & Morville. 2002. (O)
Rowley. 1994. (R)
Svenonius, 2000. Chap 8 (R)
Svenonius, 2000. Chap 9 (R)
Svenonius, 2000. Chap 10 (R)
 
Wednesday
Feb. 7
 
   

Overview of the project and the thesaurus construction process

Phase 0: Preparing

Phase 1: Defining and analyzing the domain

Team Meeting
 

  Aitchison. 2000. Sec. A, B, C and J.
ISO. 1986.
ANSI/NISO. 2005.
Soergel. 1974. Chap. C.
Svenonius, 2003.

Hjørland. 2004.
Hj
ørland. 2002.
Nielsen. 2001.
Tennis. 2003.

Wednesday
Feb. 14
 
    Phase 2: Collecting, sorting, and merging terms

Team Meeting
 

   Soergel. 1974. F0, F1, and F2. 
Wednesday
Feb. 21
 
    No class.  Study week.    
Wednesday
Feb. 28
 
    Phase 3: Selecting descriptors and establishing relationships

Team Meeting

Assignment 2 is due
 

  Aitchison. 2000. D, E, & F1.
Soergel. 1974. E & F3.
Wednesday
Mar. 7
 
    Phase 4: Constructing the classified schedules

Team Meeting
 
 
  Aitchison. 2000. F2, F3, & G.
Soergel. 1974. F4 & D4.0-D4.3.6.
Wednesday
Mar. 14
 
    Phase 5: Preparing the final product

Team Meeting
 
  Aitchison. 2000. Sec. H.
Soergel. 1974. F5 & D0-D3.4.
Wednesday
Mar. 21
 
    Catch-up
 
   
Wednesday
Mar. 28
 
    Presentations
 
   
Wednesday
Apr. 4
 
    Presentations
 
   
Wednesday
Apr. 11
    Presentations

Catch up, summary, conclusions, and evaluation of the course

Assignment 4 is due

   

 


Assignments

There are 4 required assignments in this course - three essays and design and presentation of a thesaurus:

Assignment Due Date % of Grade
Assignment 1:
Context in Indexing
Wednesday Jan. 31 20%
Assignment 2:
Essays on Indexing
Wednesday Feb. 28 30%
Assignment 3:
Design, Construction, and Presentation of Thesaurus
Design and Construction: Feb. 7 - Mar. 21
Presentation: Mar. 28 / Apr. 4 / Apr. 11
15%
Assignment 4
Final Paper
Wednesday, April 11 35%

For all written assignments:  They cannot be turned in as e-mails or as attachments to e-mails.  The paper should be double-spaced, single paged, Times Roman font 12, stapled in the upper left corner, and follow standard citation practices (such as Chicago, APA, MLA).  Please review the material you covered in Cite it Right, familiarize yourself with this site and UofT's policy, and consult the writing centre, if necessary.   Here is a general statement on what I look for in a paper.

Assignment 1: Context in Indexing

Many of the authors we have read so far in the course have discussed the notion of context in various ways.  For this assignment I want you to write an essay about the notion of context and how it influences decisions that indexers have to make in the indexing process.  What, if anything, in the context should indexers consider?  How?  Why/why not?  You must draw on the readings and you can draw on your experiences with the exercises and other experiences for this essay.

The essay should be less than 500 words long.

Due date:  This assignment is due in class on Wednesday Jan. 31.

Criteria:  You will be graded according to the discussion's and analysis' clarity, organization, depth, clarity of evaluative and analytic comments, and the demonstrated understanding of the issues involved in the evaluation of indexing and the extend to which class readings and other literature are incorporated in the discussion and analysis.  See here for a few other tips.

Assignment 2: Essays on Indexing

For this assignment I want you to reflect on what we have discussed and read so far. Your task is to write either one or two essays (in response to question 1 and/or 2) that start with two specific questions -- you can can elaborate and expand the discussion from there.

Question 1:  Many scholars of indexing theory talk about indexing as a process that consists of multiple steps. The ISO standard describes the indexing process as consisting of three steps:
  a) examining the document and establishing its subject content;
  b) identifying the principal concepts present in the subject;
  c) expressing these concepts in the terms of the indexing language

Please explain and clarify the decisions and problems that are related to each of these three steps. You may exemplify this with the indexing of a particular document.

Question 2:  The indexing literature describes a number of different approaches to indexing and we have discussed some of these in this course. Please explain and clarify some of these approaches to indexing and discuss their differences and their significance for indexing practice.

The combined length of the essay(s) should be no less than 1000 words and no more than 1500 words long.

Due date:  This assignment is due in class on Wednesday Feb. 28.

Criteria:  You will be graded according to your essays' organization, depth, clarity of evaluative and analytic comments, and the demonstrated understanding of the issues involved in indexing, as well as on the extent to which class readings and other literature are incorporated in the discussion and analysis.  See here for a few other tips.

Assignment 3: Design, Construction, and Presentation of Thesaurus

Thesaurus construction. (Teamwork)
Participate in the design and construction of a thesaurus.

Presentation.  (Teamwork)
Each team will present their thesaurus to the class.  You should present the thesaurus' features and discuss problems you encountered and how you solved them.  Make sure to talk about some of the following issues:

  • The domain.  How it the domain defined, what are some neighboring fields, what are some of the major developments and/or discussions in the field, etc.

  • The users of the thesaurus.  Who have you constructed the thesaurus for?

  • Major problems that you encountered and how you solved the problems.

  • The sources that you used for harvesting the terms and your 'method' for including terms in your thesaurus.

  • Your use of an expert and/or user group.

  • The thesaurus itself.  Including how it works and examples of how the thesaurus can be used for indexing and searching. 

  • Any special features that you want to point out.

This is a team effort.  Everyone in the team should participate in the presentation at some level, if only in preparation.  It is OK, however, to delegate primary speaking responsibilities to part of the group.

Please make your thesaurus available to the class at least 36 hours before your presentation.  You can send your thesaurus to to the class lists as an attachment or forward a url where the thesaurus is located.

Criteria:  You will be graded according to the degree to which you engage in the exercise, the depth of your considerations for decisions you make throughout the project, and the clarity of your presentation of those decisions. 

Assignment 4: Final paper

For your final paper, I would like you to reflect upon your thesaurus construction experience and write an essay where you, a) discuss your experience, what you have learned and how you potentially would do things differently next time around, or b) discuss the use of contextual information in planning and designing a thesaurus, in other words, how can information about the domain, the use, the users, the literature, etc. be used when constructing a thesaurus? or c) explore and discuss the use of controlled vocabularies in a particular domain or website.  You must cite and include the relevant literature.

The paper should be short and focused - about 1500 words. 

Due date:  This assignment is due in class on Wednesday, April 11.

Criteria:  You will be graded according to the discussion's and analysis' clarity, organization, depth, clarity of evaluative and analytic comments, and the demonstrated understanding of the issues involved in the design and construction of thesauri and the extend to which class readings and other literature are incorporated in the discussion and analysis.  See here for a few other tips.


Advisory Group

Each team in the thesaurus construction project is part of an advisory group and each advisory group consists of two-three teams. The advisory groups should meet as often as the groups find it necessary. 

The purpose of the advisory group is to support each other through the thesaurus construction process.  The agenda for the group meetings should be set by the groups themselves.  However, I suggest that the agenda looks something like this:
  1.  Status report from each team
  2.  Major problems in the teams
  3.  Problems to bring to the class discussions
and for the last meeting, it is a good idea to review each others' final products.

Advisory Groups:

Group A Group B
   
   
   

Resources

There are many controlled vocabularies, thesauri, and classification systems and information about these systems, their use, application, design, and architecture available on the web.  Here is list of sites with information about indexing, controlled vocabularies, and links to other resources:

Asilomar Institute for Information Architecture Library
"The IA Library is a selection of resources related to the field of information architecture. The collection includes articles, books, blogs, and more."
 
Karl Fast, Fred Leise and Mike Steckel
"Information architects are fascinated with faceted classification and its application to information architecture problems. However, facets remain difficult to understand and there are few options for learning about them."
 
Findability.org
Comprehensive list of information about information architecture and links to various IA stuff.
 
Traugott Koch
“Controlled vocabularies, thesauri and classification systems available in the WWW.”
Michael Middleton
"Links to examples of thesauri and to classification schemes that may be used for controlling database or WWW site subject content."
 
National Library of Canada
"The resources in this section are directed at persons interested in improving the organization and retrieval of information."
 
Alex Sigel
"Knowledge Organization Schema to which metadata refer to: Inofficial registry."
 
Dagobert Soergel [broken link]
Comprehensive list of various KO stuff.

Leonard Will, Willpower
"Software for building and editing thesauri."
 

Lifeboat for Knowledge Organization
Birger Hjorland's comprehensive dictionary of KO terms and ideas.

 


Readings

Aitchison, Jean, Alan Gilchrist & David Bawden. 2000. Thesaurus Construction and Use: a Practical Manual, 4th ed. Chicago; London: Fitzroy Dearborn. (025.49 A311T4 -- 2-hour loan) 

ANSI/NISO. 2005. Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. Z39.19-2005. American National Standards Committee, National Information Standards Organization. Bethesda, Maryland: NISO Press. [Available here].

Bates, Marcia. 1998. Indexing and Access for Digital Libraries and the Internet: Human, Database, and Domain Factors. Journal of the American Society for Information Science. 49 (13): 1185-1205. (E-journals at UofT)

Brown, Pauline, Rob Hidderley, Hugh Griffin, and Sarah Rollason. 1996. The Democratic Indexing of Images. The New Review of Hypermedia and Multimedia, 2: 107-120.

Feinberg, Melanie. 2006. An Examination of Authority in Social Classification Systems. ASIST SIG/CR Classification Research Workshop, 2006.  [Available here]

Fidel, Raya. 1994. User-Centered Indexing. Journal of the American Society for Information Science. 45 (8): 572-576. (E-journals at UofT)

Frohmann, Bernd. 1990.  Rules of Indexing: A Critique of Mentalism in Information Retrieval Theory. Journal of Documentation. 46 (2): 81-101. (Xerox File)

Hjørland, Birger. 1992. The Concept of “Subject” in Information Science. Journal of Documentation. 48 (2): 172-200. (Xerox File)

Hjørland, Birger. 1997. Information Seeking and Subject Representation: An Activity-Theoretical Approach to Information Science. Westport, CT: Greenwood.  Chapter 2: Subject Searching and Subject Representation Data. p. 11-37. (025.47 H677I -- 2-hour loan)

Hjørland, Birger. 2002. Domain analysis in Information Science. Eleven Approaches - Traditional as well as Innovative. Journal of Documentation, 58 (4): 422-462. (E-journals at UofT)

Hjørland, Birger. 2004. Domain Analysis in Information Science. In Encyclopedia of Library and Information Science. New York, NY: Marcel Dekker. p. 129-135. (R 020.3 E56E2 Suppl. 1)

International Organization for Standardization. 1985. Documentation. Methods for Examining Documents, Determining their Subjects and Selecting Indexing Terms. International Organization for Standardization. ISO 5963-1985. (On order/FIS) ; (Copy can be found in 025.3414 B811S 2-hour loan, after leaf 181)

International Organization for Standardization. 1986. Documentation. Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. Geneva: International Organization for Standardization.  ISO 2788-1986. (025.49 I616D2 -- 2-hour loan)

Mai, Jens-Erik. 2001. Semiotics and Indexing: An Analysis of the Subject Indexing Process. Journal of Documentation. Vol. 57(5): 591-622. (E-journals at UofT)

Mai, Jens-Erik. 2005. Analysis in Indexing: Document and Domain Centered Approaches. Information Processing and Management, 41 (3): 599-611. (E-journals at UofT)

Nielsen, Marianne Lykke. 2001. A Framework for Work Task Based Thesaurus Design. Journal of Documentation, 57 (6): 774-797. (E-journals at UofT)

Rosenfeld, Louis & Peter Morville. 2002. Information Architecture for the World Wide Web. 2nd ed. Sebastopol, CA: O’Reilly. (Especially chap. 9, “Thesauri, Controlled Vocabularies, and Metadata” (p. 176-208)].  (005.72 R813I2 -- 2-hour loan)

Rowley, Jennifer. 1994. The Controlled Versus Natural Indexing Languages Debate Revisited: a Perspective on Information Retrieval Practice and Research. Journal of Information Science. 20 (2): 108-119.

Soergel, Dagobert. 1974. Indexing Languages and Thesauri: Construction and Maintenance. Los Angeles: Melville.  (025.4 S681I -- 2-hour)

Svenonius, Elaine. 2003. Design of Controlled Vocabularies. In Encyclopedia of Library and Information Science. New York, NY: Marcel Dekker. p. 822-838. (R 020.3 E56E2 v.2)

Svenonius, Elaine. 2000. The Intellectual Foundation of Information Organization. Cambridge MA: MIT Press. (025.3 S968I -- 2-hour loan)

Swift, Donald F., Viola A. Winn, & Dawn A. Bramer. 1979. A Sociological Approach to the Design of Information Systems. Journal of the American Society for Information Science, 30, 215-223. (Xerox File)

Tennis, Joseph. 2003. Two Axes of Domains for Domain Analysis. Knowledge Organization, 30 (3/4): 191-195. (Xerox File)

Wilson, Patrick. 1968. Two Kinds of Power. An Essay on Bibliographical Control. Berkeley, CA: University of California Press.  Introduction, and chapters 1, 2, 3, 5, and 6. (010 W752 -- 2-hour loan)
back