IST 2140

Information Storage and Retrieval

 

COURSE DESCRIPTION:

Introduction to storage and retrieval of textual, pictorial, graphic, and voice data. The focus is on effectively interpreting imprecise queries and providing a high quality response to them from a database of incompletely described "documents."

(Prerequisites: introduction to logic and statistical analysis, familiarity with a high-level programming language)


 

COURSE OBJECTIVES:

1. to understand the dimensions of the information retrieval “problem”;

2. to understand the functions of an information retrieval system;

3. to analyse the components of an information retrieval system;

4. to consider the factors which optimize the information retrieval process;

5. to examine current issues in information retrieval.


 

RECOMMENDED TEXTBOOK:

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley.


 

ASSESSMENT:

Midterm Exam 30

Short Papers 20

Course Project 40

Participation 10


 

SCHEDULE:

Wednesday, 3:00 - 5:50 p.m., Room 404


INSTRUCTOR:

Edie Rasmussen

Office: 646 LIS Building

Tel: (412) 624-9459

Fax: (412) 648-7001

Email:erasmus@mail.sis.pitt.edu

Office Hours: Mon. 2:00-4:00 p.m.

Tues. 9:30-11:30 a.m.


 

GSA:

Shveta Goel

Office: A-206 IS Building

Email:shg26@pitt.edu

Office Hours: Mon. 10:30 am-12:30 p.m.


 

Course Policies

Attendance

Class attendance is required for success in this course, as material will be covered in class which is not included in the textbook. A part of the final grade (10% of the total) will be based on your attendance and participation. If you must miss a class please notify the teaching fellow, and make arrangements to obtain course notes and handouts. Makeup exams for the midterm and final will not be offered except under extreme circumstances.

Plagiarism

It is expected that the work you submit in this course will be your own. While collaboration is allowed for the course project, it should be approved in advance and the nature of each contribution should be specified in the project proposal and the final submission.

The following statement is taken from The Teaching Assistant Experience: A Handbook for Teaching Assistants and Teaching Fellows at the University of Pittsburgh (A.P. Haley and J.M. Nicoll, eds.)

Plagiarism means submitting work as your own that is someone else’s. For example, copying material from a book or other source without acknowledging that the works or ideas are someone else’s and not your own is plagiarism. If you copy an author’s words exactly, treat the passage as a direct quotation and supply the appropriate citation. If you use someone else’s ideas, even if you paraphrase the wording, appropriate credit should be given. You have committed plagiarism if you purchase a term paper or submit a paper as your own that you did not write.

Plagiarism is a violation of the University of Pittsburgh’s standards on academic honesty, and violations of this policy are taken seriously. From the Guidelines on Academic Integrity: Student and Faculty Obligations and Hearing Procedures (effective September, 1995):

A student has an obligation to exhibit honesty, and to respect the ethical standards of the historical profession in carrying out his or her academic assignments. Without limiting the application of this principle, a student may be found to have violated this obligation if he or she:

1. Presents as one’s own, for academic evaluation, the ideas, representations, or words of another person or persons without customary and proper acknowledgment of sources.

2. Submits the work of another person in a manner which represents the work to be one’s own. [Quotation ellipsed.]

Special Needs

Students with disabilities who require special accommodations or other classroom modifications should notify the instructor and the University's Office of Disability Resources & Services (DRS) no later than the 2nd week of the term. Students may be asked to provide documentation of their disability to determine the appropriateness of the request. DRS is located in 216 William Pitt Union and can be contacted at 648-7890 (Voice), 624-3346(Fax), and 383-7355(TTY). Students who must miss an exam or class due to religious observances must notify the instructor ahead of time and make alternative arrangements.


Course Outline

Week Date Topic

1 August 28, 2001 Introduction to Course

Information Retrieval Systems and their Design

Lecture 1 -- Powerpoint slides

 

2 September 4, 2001 Documents and Queries

Representing Document Content

Lecture 2 -- Powerpoint slides

 

3 September 11, 2001 Information Retrieval Models I

Boolean Model

Vector Model

Lecture 3 -- Powerpoint slides

 

4 September 18, 2001 Information Retrieval Models II

Probabilistic Models

Cluster-based Retrieval

Language Models

Lecture 4 -- Powerpoint slides

 

5 September 25, 2001 Implementing IR Systems

Storage

Search Algorithms

Software

Lecture 5 -- Powerpoint slides

 

6 October 2, 2001 Measuring Effectiveness of IR Systems

Lecture 5 -- Powerpoint slides

 

7 October 9, 2001 Improving Effectiveness of IR Systems

Relevance Feedback

Query Expansion

Review Session

Lecture 7 -- Powerpoint slides

 

8 October 16, 2001 Mid-term Exam

 

9 October 23, 2001 Alternative Retrieval Techniques

Latent semantic indexing

Citation-based Retrieval

Hypertext Retrieval

Natural Language Processing

Machine Learning

Lecture 8 -- Powerpoint slides

 

10 October 30, 2001 Other IR Problems:

Cross-lingual Information Retrieval

Document Representation

Text Summarization

Question-Answering

Text Categorization

Data Mining

 

11 November 6, 2001 Information Retrieval and the WWW

 

12 November 13, 2001 Multimedia information Retrieval

Images

Video

Sound

 

13 November 20, 2001 Users and Information Retrieval

User Modelling

User Interfaces

Information Visualization

Short Papers Due

 

November 27, 2001 Thanksgiving - No Class

 

14 December 4, 2001 Social Issues in IR

Course Review

 

15 December 11, 2001 Presentation of Course Projects

 


Assessment

Student work for this course involves several components.

1. A midterm exam on October 16 on the work covered in class to October 9 (the basic information on information storage and retrieval systems).

 

2. A course project which will involve creating or installing an information storage and retrieval system, loading a set of documents (to be provided) and testing it against a set of queries (also provided). Candidate systems will be identified. The project can be done individually or in groups of 2 or 3. In the final class students will report the results from their system, analyse its strengths and weaknesses, and compare the results across systems.

3. Two short papers, one from a list of topics to be provided from the material covered in the second half of the term, the other a user evaluation of an IR simulation.

4. Participation in the class (attendance and contribution to discussions).


 

Reserve List

Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM.

Chowdhury, G.G. (1999). Introduction to Modern Information Retrieval. London: Library Association.

Frakes, W.B. and Baeza-Yates, R. (eds.) (1992). Information Retrieval: Data Structures & Algorithms. Englewood Cliffs, NJ: Prentice-Hall.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley.

Lancaster, F.W. and Warner, A. (1993). Information Retrieval Today. Arlington, VA: Information Resources Press.

Meadow, C.T., Boyce, B.R., and Kraft, D.H. (2001). Text information retrieval systems. San Diego, CA: Academic.

Salton, G. (1989). Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.

Witten, I.H., Moffat, A., and Bell, T.C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images. 2nd ed. San Francisco, CA: Morgan Kaufmann.

 


Weekly Reading List

Week 1 and 2:

Information Retrieval Systems and their Design

Documents and Queries

Representing Document Content

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch.1, “Overview”, pp. 1-16.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch.2, “Document and query forms”, pp. 17-49.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch.5, “Text analysis”, pp. 105-143.

Lancaster, F.W. and Warner, A. (1993). Information Retrieval Today. Arlington, VA: Information Resources Press. Ch. 1, “Some basics of information retrieval”, pp. 1-20.

 

Week 3:

Information Retrieval Models I

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 3, “Query structures”, pp. 51-78.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 4, “The matching process”, pp. 79-104.

 

Week 4:

Information Retrieval Models II

Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM. Ch. 2, “Modeling”, pp. 19-71.

Lavrenko, V. and Croft, B. (2001). Relevance-based language models. SIGIR01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM. pp. 120-127. Available at http://ciir.cs.umass.edu/~lavrenko/pub/RelevanceModels.pdf

Rasmussen, E. (1992). “Clustering Algorithms”. In Information Retrieval: Data Structures and Algorithms (W.B. Frakes and R Baeza-Yates, eds.). Englewood Cliffs, NJ: Prentice Hall. Pp. 419-442.

Salton, G. (1989). Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley. Ch. 10, “Advanced information-retrieval models”, pp. 313-373.

 

Week 5:

Implementing IR Systems

Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM. Ch. 8, “Indexing and searching”, pp. 191-228.

Croft, W.B. (2001). An Overview of InQuery as used for the TIPSTER Project. Available at: http://ciir.cs.umass.edu/demonstrations/InQueryRetrievalEngine.html.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Appendix B, “File Structures”, pp. 305-312.

Witten, I.H. et al. (2001). Greenstone: a comprehensive open-source digital library software system. In: Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, TX, June 2-7, 2001. (New York, NY: ACM). Pp. 113-121. (Software download at http://www.nzdl.org/)

Witten, I.H., Moffat, A., and Bell, T.C. (1999) Chapter 3, “Indexing”; Chapter 4, “Querying”, in Managing Gigabytes, 2nd ed. Morgan Kaufmann.

 

Week 6:

Measuring Effectiveness of IR Systems

Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM. Ch. 3, “Retrieval Evaluation”, pp. 73-97.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 8, “Retrieval effectiveness measures”, pp. 191-218.

Mizzaro, S. (1997). Relevance: the whole history. Journal of the American Society for Information Science 48(9): 810-832.

Tague-Sutcliffe, J. (1992). The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4): 467-490.

Voorhees, E. & Harman, D. (2001). Overview of the Tenth Text REtrieval Conference (TREC-10) In: NIST Special Publication 500-250: The Tenth Text REtrieval Conference (TREC 10) (National Institute of Standards and Technology). http://trec.nist.gov/pubs/trec10/papers/overview_10.pdf

 

Week 7:

Improving Effectiveness of IR Systems

Efthimiadis, E. (1996). Query expansion. Annual Review of Information Science and Technology 31: 121-187.

Harman, D. (1992). “Relevance feedback and other query modification techniques”. In: Frakes, W.B. and Baeza-Yates, R. (eds.), Information Retrieval: Data Structures & Algorithms. Englewood Cliffs, NJ: Prentice-Hall. Pp. 241-263.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch 9, “Effectiveness improvement techniques”, pp. 219-236.

Salton, G. & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41: 288-297.

 

Week 9:

Alternative Retrieval Techniques

Chen, H. (1995). Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. Journal of the American Society for Information Science 46(3): 194-216

Dunlop, M.D. and Van Rijsbergen, C.J. (1993). Hypermedia and free text retrieval. Information Procession & Management 29(3): 287-298.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 10, “Alternative retrieval techniques”, pp. 235-256.

Strzalkowski, T. (1995). Natural language information retrieval. Information Processing & Management 31: 397-417 (1995).

 

Week 10:

Other IR Problems:

Hasnah, A. and Evans, M. (2001). Arabic/English cross language information retrieval using a bilingual dictionary. ACL/EACL Workshop 2001: Arabic Language Processing: Status and Prospects. Available at: http://www.elsnet.org/arabic2001/hasnah.pdf

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 11, “Output presentation”. Pp. 257-270.

Lam, W., Ruiz, M. & Srinivasan, P. (1999). Automatic text categorization and its application to text retrieval. IEEE Transactions on Knowledge and Data Engineering 11(6): 865-879.

Mani, I. et al. (1998). The TIPSTER SUMMAC Text Summarization Evaluation. Final Report. October, 1998. McLean, VA: MITRE. (Mitre Technical Report MTR 98W0000138). Available at: http://www-nlpir.nist.gov/related_projects/tipster_summac/final_rpt.html

Voorhees, E. (2001). The TREC-10 Question Answering Track. In : NIST Special Publication 500-250: The Tenth Text REtrieval Conference (TREC 8). Available from: http://trec.nist.gov/pubs/trec10/papers/qa10.pdf

Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval 1: 69-90.

 

Week 11:

Information Retrieval and the WWW

Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., & Raghavan, S. (2000). Searching the Web. Stanford University Technical Report 2000-37. [Online] Available at http://dbpubs.stanford.edu/pub/2000-37

Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM. Ch. 13, “Searching the Web”, pp. 367-395.

Schwartz, C. (1998). Web search engines. Journal of the American Society for Information Science 49(11): 973-982.

 

Week 12:

Multimedia information Retrieval

Del Bimbo, A. (1999). Visual Information Retrieval. San Francisco: Morgan Kaufmann. Ch. 1, “Introduction”, pp. 1-28 only.

Gupta, A. & Jain, R. (1997). Visual information retrieval. Communications of the ACM 40(5): 70-79.

McNab, R.J. et al. (1996). Towards the digital music library: tune retrieval from acoustic input. In: Proceedings of the 1st ACM International Conference on Digital Libraries, Bethesda, MD, March 20-23, 1996. (New York, NY: ACM). Pp. 11-18.

Wold, E., Blum, T., Keislar, D. and Wheaton, J. (1999). Classification, search, and retrieval of audio. CRC Handbook of Multimedia Computing 1999. Available at: http://www.musclefish.com/crc/crcwin.html

Yeo, B. & Yeung, M. (1997). Retrieving and visualizing video. Communications of the ACM 40 (12): 43-52.

 

Week 13:

Users and Information Retrieval

Hearst, M.A. (1999). Chapter 10, “User Interfaces and Visualization). In Modern Information Retrieval (Baeza-Yates, R. & Ribeiro-Neto, B., eds.) New York: ACM. pp. 257-323.

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 7, “Multiple reference point systems”, pp. 163-189.

Olsen, K.A. et al. (1993). Visualization of a document collection: the VIBE system. Information Processing & Management 29(1): 69-81.

Shneiderman, B. (1998). Designing the User Interface. 3rd ed. Reading, MA: Addison-Wesley. Ch. 15, “Information search and visualization”, pp. 509-549.

 

Week 14

Social Issues in IR

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: John Wiley. Ch. 13, “The Ectosystem and policy issues”, pp. 281-289.

 

EMR Home DLIS Home