Pitt Logo SIS Logo

Department of Information Science and Telecommunications

 

INFSCI 2140 - Information Storage and Retrieval

(Fall 2005, CRN 16812)


[ Formal Data | Syllabus | Schedule | Course Books | Course Materials | CourseWeb | Course Tools | Links ]


INFSCI 2140 Course Materials

Attention! This is the Summer 2004 schedule of this course. Since the course is being updated and since Summer course had fewer lectures than the regular IS2140 course, the following should be considered as a draft. This goal of this draft is to help you to make a decision whether you need this course and prepare for the course. This page will be updated along the progress of the course. Lecture slides will be provided at each lecture and posted on this page in .pdf format.
Lecture Objectives Concepts Readings Handouts
Lecture 1
Information System Design
Introduction to the course. Logistics. Information Systems: Designer's view. Evolution and Evaluation. Abstractions. Documents and queries. Documents and document surrogates. What is in surrogates? What is in documents? information retrieval, information system, information, data, information need, ectosystem, endosystem, user, funder, server, medium, device, algorithm, hypertext, effectiveness, efficiency, economy, abstraction, knowledge, wisdom, information theory, knowledge base, expert system, information retrieval system, information retrieval, SGML, HTML, RTF, HTML, GIF, JPEG, JBIG, MIDI, MPEG, EBCDIC, ASCII, MP3, SMILE, QuickTime, query, document, formatted document, unformatted document, document surrogate, matching, mapping, keyword, key phrase, extract, abstract, document identifier, review, uncontrolled vocabulary, vocabulary, controlled vocabulary, byte, atomic data, bit, ANSI, data compression, stemming, Huffman code, Ziv-Lempel code, data model, adaptive model, static model, semi-static model, synchronization point, level of compression, prefix property, markup language, segmentation, integrated media document, integrated media system, multimedia document, multimedia system, geographic information system, run length encoding, fine structure, metadata, document encoding, rich text, bit map Korfhage:
  • Introduction
  • Chapter 1
  • Chapter 2 (2.1-2.5, 2.7-2.8)
  • Chapter 13
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval",
Slides
Lecture 2:
Models of Information Retrieval: Boolean Model
Search and Browsing. Queries and matching. Classic Boolean model. exact match, range match, approximate match, proximity, elementary query, Boolean query, conjunctive normal form, conjunct, conjunctive query, disjunctive normal form, disjunct, disjunctive query, proximity operator, normalization, truth table, full disjunctive normal form, term, DeMorgan's Laws, Law of Double Negation, characteristic function, measure, Boolean query matching, Boolean query system, AND, OR, NOT, NOF Korfhage:
  • Chapter 3 (3.1, 3.2)
  • Chapter 4 (4.1, 4.2)
Slides
Lecture 3:
Models of IR - II
Document space, measure, similarity. Classic vector model. Queries and matching for advanced models. Problems with classic Boolean model. Extended Boolean model. Fuzzy model, Other models and aspects of matching. proximity, vector, vector model, vector of terms, 0-1 vector, weight vector, contingency table, dimensional compatibility, judging dilemma, document space, lexical similarity, distance measure, dissimilarity measure, distance to similarity transformation, cosine measure, inner product, intrinsic measure, extrinsic measure Korfhage:
  • Chapter 3 (3.3 - 3.8)
  • Chapter 4 (4.3 -4.12)
Slides
Lecture 4:
Text Analysis
From text to index. Types of indexing. Zipf's law. The problem of choosing significant terms. TF*IDF. Stop lists and stemming.Thesauri. Document similarity. Multi-language retrieval.  

Korfhage: Chapter 5

C. J. van Rijsbergen "Information Retrieval", Chapter 2

Slides
Lecture 5:
Retrieval Effectiveness
Measures for relevance. Precision, recall, fallout, generality. Coverage ratio, novelty ratio, relative recall, recall effort. Average precision and recall. Expected search length. Normalized precision and recall. Sliding ratio. Satisfaction and frustration.  

Korfhage, Chapter 8 (skip 8.5)
AND
(C. J. van Rijsbergen "Information Retrieval", Chapter 7
OR
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 3: pp. 73-96")

Slides
Lecture 6:
Alternative retrieval techniques
Citation processing, hypertext browsing, Information Visualization and its use for information access. Adaptive information visualization and adaptive navigation support.   Korfhage: Chapter 10
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 10 (10.1-10.4)
Slides
Lecture 7:
Output presentation and visualization
What to present, ranking, clustering, output exploration, visual interfaces for output exploration: GUIDO, VIBE, BIRD, TileBars, LyberWorld   Korfhage: Chapter 11, Chapter 7
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 10: User Interfaces and Visualization (10.5-10.9)
Slides
Lecture 8:
Improving Search Effectiveness
Interactive search. Graphical Search Interfaces. Relevance Feedback and Query Expansion.   Korfhage: Chapter 6, Chapter 9
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 5: Query Operations
Slides
Lecture 9:
Taking user into account
Information Filtering. User profiles, user models, adaptive information retrieval, adaptive presentation. Recommender systems   Korfhage: Chapter 6, Chapter 9
R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 5: Query Operations
Slides
Lecture 10:
Data Structures and Algorithms
Document processing, storage, search. Document files. Search problem. Simple search solution. Algorithms for searching and sorting. Complexity. Advanced searchable data structures: index files, inverted files, B-trees.  

C. J. van Rijsbergen "Information Retrieval", Chapter 4

Korfhage, Appendix B

R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 8: Indexing and Searching

Btree tutorial

Slides
Lecture 11:
Web IR and other modern problems of information retrieval
Characterising the Web. Search engines. Meta-search. Search services. Agents and bots. Clustering and information exploration. Use of hyperlinks. Web recommenders and other adaptive Web-based information systems.   R. Baeza-Yates, B. Ribeiro-Neto: 1999. "Modern Information Retrieval", Chapter 13
Slides

Copyright © 2005 Peter Brusilovsky