Pitt Logo SIS Logo

Department of Information Science and Telecommunications

 

INFSCI 2140 - Information Storage and Retrieval

(Spring 2001, CRN 43826)


[ Formal Data | Course Tools | Syllabus & Schedule | Course Book | Course Materials | CourseInfo | Links ]


INFSCI 0015 Course Materials

Lecture Objectives Concepts Readings Handouts
Lecture 1
Information System Design
Introduction to the course. Logistics. Abstractions. Information Systems. Evolution and Evaluation. information retrieval, information system, information, data, information need, ectosystem, endosystem, user, funder, server, medium, device, algorithm, hypertext, effectiveness, efficiency, economy, abstraction, knowledge, wisdom, information theory, knowledge base, expert system, information retrieval system, information retrieval Korfhage:Introduction, Chapter 1
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Introduction
Slides
Lecture 2
Search, Document, Query
Documents and queries. Documents and document surrogates. What is in surrogates? What is in documents? SGML, HTML, RTF, HTML, GIF, JPEG, JBIG, MIDI, MPEG, EBCDIC, ASCII, MP3, SMILE, QuickTime, query, document, formatted document, unformatted document, document surrogate, matching, mapping, keyword, key phrase, extract, abstract, document identifier, review, uncontrolled vocabulary, vocabulary, controlled vocabulary, byte, atomic data, bit, ANSI, data compression, stemming, Huffman code, Ziv-Lempel code, data model, adaptive model, static model, semi-static model, synchronization point, level of compression, prefix property, markup language, segmentation, integrated media document, integrated media system, multimedia document, multimedia system, geographic information system, run length encoding, fine structure, metadata, document encoding, rich text, bit map Korfhage: Chapter 2 Slides
Lecture 3:
Models of Information Retrieval
Queries and matching. Document space, measure, similarity. Classic Boolean model. Classic vector model. exact match, range match, approximate match, proximity, elementary query, Boolean query, conjunctive normal form, conjunct, conjunctive query, disjunctive normal form, disjunct, disjunctive query, proximity operator, normalization, truth table, full disjunctive normal form, term, DeMorgan's Laws, Law of Double Negation, vector, vector model, vector of terms, 0-1 vector, weight vector, contingency table, dimensional compatibility, judging dilemma, document space, characteristic function, measure, lexical similarity, Boolean query matching, Boolean query system AND, OR, NOT, NOF, distance measure, dissimilarity measure, distance to similarity transformation, cosine measure, inner product, intrinsic measure, extrinsic measure Korfhage: Chapter 3: Sections 3.1 - 3.3 ; Chapter 4: Sections 4.1 - 4.4 Slides
Lecture 4:
Models II
Queries and matching for advanced models. Extended Boolean model. Fuzzy model, Other models and aspects of matching.   Korfhage: Chapter 3: Sections 3.4 - 3.8 ; Chapter 4: Sections 4.5 - 4.12 Slides
Lecture 5:
Retrieval Effectiveness
Measures for relevancy. Precision, recall, fallout, generality. Coverage ratio, novelty ratio, relative recall, recall effort. Average precision and recall. Expected search length. Normalized precision and recall. Sliding ratio. Satisfaction and frustration.   Korfhage, Chapter 8 (skip 8.5)
AND
("R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 3: pp. 73-96"
OR
"Mizzaro, S.:1997. Relevance: the whole history. Journal of the American Society for Information Science 48(9): 810-832.")
Slides
Lecture 6:
Text Analysis
From text to index. Types of indexing. Zipf's law. The problem of choosing significant terms. TF*IDF. Stop lists and stemming.Thesauti. Document similarity. Multi-language retrieval.   Korfhage: Chapter 5 Slides
Lecture 7:
Data Structures and Algorithms
Document processing, storage, search. Document files. Search problem. Simple search solution. Algorithms for searching and sorting. Complexity. Advanced searchable data structures: index files, inverted files, B-trees.   Korfhage, Appendix B
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 8: Indexing and Searching
Slides
B-Trees Slides
Lecture 8:
Output presentation and visualization
What to present, ranking, clustering, output exploration, visual interfaces for output exploration: GUIDO, VIBE, BIRD, TileBars, LyberWorld   Korfhage: Chapter 11, Chapter 7
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 10: User Interfaces and Visualization
Slides
Lecture 9:
Improving Effectiveness
Relevance Feedback, Query Expansion, Genetic Algorithms   Korfhage: Chapter 9
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 5: Query Operations
Lecture Slides
User Modeling in Information Retrieval Slides (Nick Belkin)
Lecture 10:
Adapting to the user
User profiles, user models, adaptive information retrieval   Korfhage: Chapter 6 Lecture Slides
User Modeling in Information Retrieval Slides (Nick Belkin)
Lecture 11:
Alternative retrieval techniques
Natural language processing, citation processing, hypertext browsing, dynamic queries and implicit queries   Korfhage: Chapter 10, Sections 10.1-10.4 Slides
Lecture 12:
Multimedia IR and digital libraries
Image, sound, video IR. Modern Digtal Libraries   Korfhage: Chapter 10, Sections 10.5 and 10.6
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapters 11 and 12
 
Lecture 13:
Web IR and other modern problems of information retrieval
Characterising the Web. Search engines. Meta-search. Search services. Agents and bots. Clustering and information exploration. Use of hyperlinks. Web recommenders and other adaptive Web-based information systems.  
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 13
Slides

Copyright © 2001 Peter Brusilovsky