Week 1: Introduction, Text processing, Document Representation, Tokenization, Term filtering, Term Document Incidence Matrix, Boolean Retrieval, Inverted Index, query processing, optimization, skip pointers
Week 2: Inverted Index, Storing the index, BSBI, SPIMI, Zipf's and Heaps' Law, Dictionary compression, Postings Compression
Week 3: Getting started with PyLucene - Indexing, Lucene Document and Field Options
Week 4: Indexing Wiki Movies, Non-English Text Analysis, Luke - Index viewer, Different options in Indexing, PyLucene Practice Programming
Week 5: Ranked Retrieval, Jaccard Similarity, Term Frequency, Scaling TF, TF-IDF weighting, Inner product, Euclidean Distance and their problem, Cosine Similarity, VSM Algorithm, SMART Notation
Week 6: VSM Problem Solving, Probabilistic Model - Introduction, Probability Ranking Principle, BIM for ranked retrieval, BM1, BM11, BM15, BM25, Dissecting BM25, BM25 vs VSM, BM25 for long queries, BM25F, BM25+, Why BM25 is still relevant?
Week 7: Language Model for Information Retrieval, Unigram Language Model, Estimating Document Language Model, Zero Frequency Problem and Introduction to Smoothing, Jelinek-Mercer and Dirichlet Smoothed Language Model, Comparing Smoothing with IDF and Summary of LM-based Retrieval
Week 8: Using KLD, JSD in Information Retrieval, PyLucene - Retrieval, PyLucene - Various Query Classes - TermQuery, PhraseQuery, TermRangeQuery, Numerical Range Query, PyLucene - Various Query Classes - PrefixQuery, BooleanQuery, WildcardQuery, FuzzyQuery, MatchAllDocsQuery
Week 9: Evaluation - Set-based evaluation metrics, Precision, Recall, F measure, Precision at K, R-Prec, Incorporating Ranking in Precision and Recall, AP, MAP, GMAP, MRR, Graded relevance, nDCG, Hypothesis Testing, Role of Evaluation Forums, Kappa measure
Week 10: Indexing and Retrieval of Benchmark Datasets, Indexing and Retrieval in TREC-like Benchmark Datasets, Evaluation using `TREC_EVAL`, Hypothesis testing in IR, Relevance Feedback - Rocchio, RLM
Week 11: Web Search And Crawler, Shingling, PageRank - Random Surfer's Algorithm, HITS, SEO
Week 12: Learning to Rank, Latent Semantic Indexing, An Introduction to Embeddings- Word, Sentence and Document embeddings, Applications of BERT and LLMs in IR
DOWNLOAD APP
FOLLOW US