Information Retrieval(IR) Syllabus

Syllabus Notes Old Questions & solutions Text & reference books

This page contains Syllabus of Information Retrieval of CSIT.

Title Information Retrieval
Short Name IR
Course code CSC405
Nature of course Theory + Lab
Semester seventh-semester
Full marks 60 + 20 + 20
Pass marks 24 + 8 + 8
Credit Hrs 3
Elective/Compulsary Elective

Course Description

Course Synopsis: Advanced aspects of Information Retrieval and Search Engine


Goal:

To study advance aspects of information retrieval and working principle of search engine,

encompassing the principles, research results and commercial application of the current

technologies.


Units and Unit Content

1. Introduction
teaching hours: 2 hrs

Introduction, History of Information Retrieval, The retrieval process, Block diagram and architecture

of IR System, Web search and IR, Areas and role of AI for IR


2. Basic IR Models:
teaching hours: 4 hrs

Introduction, Taxonomy of information retrieval models, Document retrieval and ranking, A formal

characterization of IR models, Boolean retrieval model, Vector-space retrieval model, probabilistic

model, Text-similarity metrics: TF-IDF (term frequency/inverse document frequency) weighting and

cosine similarity.


3. Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
teaching hours: 4 hrs

Simple tokenizing, Word tokenization, Text Normalization, Stop-word removal, Word Stemming

(Porter Algorithm), Case folding, Lemmatization, Inverted indices (Indexing architecture), Efficient

processing with sparse vectors, Sentence segmentation and Decision Trees


4. Experimental Evaluation of IR
teaching hours: 4 hrs

Relevance and Retrieval, performance metrics, Basic Measures of text retrieval (Recall, Precision and

F-measure)


5. Query Operations and Languages
teaching hours: 3 hrs

Relevance feedback and pseudo relevance feedback, Query expansion/reformulation (with a thesaurus

or WordNet, Spelling correction like techniques), Query languages (Single-Word Queries, Context

Queries, Boolean Queries, Natural Language)


6. Text Representation:
teaching hours: 3 hrs

Word statistics (Zipf's law), Morphological analysis, Index term selection, Using thesauri, Metadata,

Text representation using markup languages (SGML, HTML, XML)


7. Search Engine:
teaching hours: 6 hrs

Search engines (working principle), Spidering (Structure of a spider, Simple spidering algorithm,

multithreaded spidering, Bot), Directed spidering(Topic directed, Link directed) ,Crawlers (Basic

crawler architecture), Link analysis (e.g. hubs and authorities, Page ranking, Google Page Rank),

shopping agents


8. Text Categorization and Clustering
teaching hours: 6 hrs

Categorization algorithms (Rocchio; naive Bayes; decision trees; and nearest neighbor), Clustering

algorithms (agglomerative clustering; k-means; expectation maximization (EM)) ,Applications to

information filtering; organization


9. Recommender Systems:
teaching hours: 3 hrs

Personalization, Collaborative filtering recommendation, Content-based recommendation


10. Information Extraction and Integration:
teaching hours: 3 hrs

Information extraction and applications, Extracting data from text, Evaluating IE Accuracy, XML and

Information Extraction, Semantic web (purpose, Relation to hypertext page), Collecting and

integrating specialized information on the web.


11. Advanced IR Models with indexing and searching text
teaching hours: 4 hrs

Probabilistic models, Generalized Vector Space Model, Latent Semantic Indexing (LSI), Efficient

string searching, Pattern matching


12. Multimedia IR
teaching hours: 3 hrs

Introduction, multimedia data support in commercial DBMSs, Query languages, Trends and research

issues


Lab and Practical works

Laboratory Works:

The laboratory should contain all the features mentioned in a course

Samples

1. Program to demonstrate the Boolean Retrieval Model and Vector Space Model

2. Program to find the similarity between documents

3. Tokenize the words of large documents according to type and token.

4. Segment the documents according to sentences

5. Implement Porter stemmer

6. Try to build a stemmer for Nepali language

7. Build a spider that tracks only the link of nepali documents

8. Group the online news onto different categorize like sports, entertainment, politics

9. Build a recommender system for online music store