X

Algorithms in Computational Biology and Sequence Analysis

By Prof. Chirag Jain   |   IISc Bangalore
Learners enrolled: 1908   |  Exam registration: 83
ABOUT THE COURSE:
This course is intended to provide a broad overview of fundamental algorithms and data structure to analyse large biological datasets. Several major questions in modern biology such as (i) how to find mutations in a genome sequence, or (ii) how do we trace evolutionary relationships among species, can only be answered using efficient algorithms. This course is particularly relevant for computer science or applied maths students who wish to pursue a career in designing algorithmic solutions for scientific applications. The course includes hands-on programming exercises to appreciate the complexity of real-world data such as the SARS-Cov2 genome database.

INTENDED AUDIENCE: Students with interest in developing algorithms and fast software that are applicable to the emerging biology and genomics applications

PREREQUISITES: Elementary knowledge of discrete mathematics, basic algorithms and data structures is required. Programming proficiency with either C or C++ or Java or Python is required. Knowledge of basic algorithms for sorting, searching, hashing, graph traversal algorithms will be required.
Programming, Data Structures and Algorithms using Python : https://nptel.ac.in/courses/106106145
Design and Analysis of Algorithms : https://nptel.ac.in/courses/106106131
Programming and Data structures (PDS) : https://nptel.ac.in/courses/106106130
Programming, Data Structures and Algorithms : https://nptel.ac.in/courses/106106127
Data Structure and algorithms using Java : https://nptel.ac.in/courses/106105225

INDUSTRY SUPPORT: Companies developing software for molecular biology and omics applications (e.g., Google Health, Strand Life Sciences)
Summary
Course Status : Ongoing
Course Type : Elective
Language for course content : English
Duration : 12 weeks
Category :
  • Computer Science and Engineering
  • Multidisciplinary
  • Computational Biology
  • Data Science
  • Artificial Intelligence
Credit Points : 3
Level : Undergraduate/Postgraduate
Start Date : 20 Jan 2025
End Date : 11 Apr 2025
Enrollment Ends : 03 Feb 2025
Exam Registration Ends : 28 Feb 2025
Exam Date : 27 Apr 2025 IST

Note: This exam date is subject to change based on seat availability. You can check final exam date on your hall ticket.


Page Visits



Course layout

Week 1: Introduction
  • Brief review of the fundamentals of molecular biology and genetics. 
  • Examples of widely used software, algorithms, databases.
Week 2: Strings and exact matching
  • Z-algorithm, suffix arrays, suffix array construction
Week 3: Strings and exact matching
  • Suffix trees, suffix tree construction, applications of suffix trees
Week 4: Strings and exact matching, Pairwise Sequence Alignment
  • Burrows-Wheeler Transformation, BWT index, Generalised rank operations, succinct suffix arrays
  • Classic dynamic programming ideas for pairwise sequence alignment. 
  • Edit distance, global alignment
Week 5: Pairwise Sequence Alignment
  • Local alignment, Incorporating gaps in alignments
  • Statistical measures of alignment significance.
Week 6: Heuristic-based Sequence Alignment
  • Mathematical ideas underlying heuristic sequence aligners.
  • Maximal unique matches, co-linear chaining
Week 7: Heuristic-based Sequence Alignment, Genome reconstruction using graph algorithms
  • Incorporating gaps into the chaining algorithm
  • Applications of sequence alignment for mutation finding and disease diagnosis.
  • Shortest common superstring formulation for genome reconstruction
Week 8: Genome reconstruction using graph algorithms
  • Greedy approach to genome reconstruction
  • de Bruijn Graphs, Overlap graphs
Week 9: Evolutionary tree construction 
  • Multiple sequence alignment – formulations, optimal and approximation algorithms
  • Classical and contemporary algorithms for inferring evolutionary trees.
Week 10: Probabilistic/machine learning-based sequence models
  • Gene finding. Hidden Markov models
  • Large language models for biological sequences
Week 11: Pangenome graphs
  • Overview of pangenome representations
  • Aligning sequences to a pangenome
Week 12: Research papers

Books and references

There is no textbook required for the course.

For further reading after lectures, the following books can be used: 

  • Genome-Scale Algorithm Design (Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, Alexandru I. Tomescu)
  • Algorithms on Strings, Trees, and Sequences (Dan Gusfield)
  • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison)
  • Handbook of computational molecular biology (Srinivas Aluru)

Instructor bio

Prof. Chirag Jain

IISc Bangalore
Prof. Chirag Jain is an Assistant Professor and India Alliance Intermediate Fellow in the Department of Computational and Data Sciences at the Indian Institute of Science Bangalore. His research group (https://at-cg.github.io) develops scalable algorithms and software for genomics applications. Prior to his appointment at IISc, he was working as a post-doctoral fellow at the National Institutes of Health USA. He had completed his PhD dissertation in 2019 at Georgia Tech, for which he was awarded the College of Computing Dissertation Award.

Course certificate

The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: EXAM DATE : April 27, 2025 Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

Please note that assignments encompass all types (including quizzes, programming tasks, and essay submissions) available in the specific week.

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IISc Bangalore .It will be e-verifiable at nptel.ac.in/noc.

Only the e-certificate will be made available. Hard copies will not be dispatched.

Once again, thanks for your interest in our online courses and certification. Happy learning.

- NPTEL team


MHRD logo Swayam logo

DOWNLOAD APP

Goto google play store

FOLLOW US