X

Statistical Foundation for Big Data Analysis

By Prof. Arindam Banerjee   |   IIT Kharagpur
Learners enrolled: 732
ABOUT THE COURSE:

This course will be on the statistical foundation of big data analysis. Big data is data whose volume and complexity is too huge for classical statistical tools to be useful but nevertheless comes from extremely important real life situations that need to be statistically analysed and prediction rules need to be obtained. Often, in classical terms, such data is just very high dimensional data. In this course we introduce the framework of statistical analysis with an eye towards analysing such data. We not only introduce tools of classical statistics like statistical inference, multivariate regression, multivariate normal distribution with an eye towards analysing such data sets but we also cover topics fundamental to analysing big data like bias-variance trade off, issues of dimensionality, principal component analysis and network data analysis. We plan to end the course with a very important topic of social network analysis.

INTENDED AUDIENCE: Senior UG and PG students from Mathematics, AI, Data Science, CSE and ECE.

PREREQUISITES: Some exposure to linear algebra and probability

INDUSTRY SUPPORT: 
  1. Data Science
  2. Machine Learning
  3. Artificial Intelligence
Summary
Course Status : Upcoming
Course Type : Elective
Language for course content : English
Duration : 12 weeks
Category :
  • Computer Science and Engineering
Credit Points : 3
Level : Undergraduate/Postgraduate
Start Date : 19 Jan 2026
End Date : 10 Apr 2026
Enrollment Ends : 26 Jan 2026
Exam Registration Ends : 13 Feb 2026
Exam Date : 19 Apr 2026 IST
NCrF Level   : 4.5 — 8.0

Note: This exam date is subject to change based on seat availability. You can check final exam date on your hall ticket.


Page Visits



Course layout

Week 1:  Introduction:
  • What is big data? Examples/Case Studies of big data and challenges of handling big data.
  • Big Data Analysis vs Big Data Processing.
  • Role of high dimensional statistics as foundation of Big Data Analysis.
  • General overview of statistical framework.
  • Classical statistical analysis is inference from data using linear algebra and calculus.
  • Big data analysis as inference from data using linear algebra, calculus and high performance computing

Week 2: Review of basic statistical inference 1:

Point Estimation:
  • Unbiased Estimation
  • Maximum Likelihood Estimation
  • Method of Moments Estimation
  • Related Asymptotics and Case Studies

Week 3: Review of basic statistical inference 2:
  • Concept of Interval Estimation with examples
  • Basics of Hypothesis Testing with examples

Week 4: Statistical Learning Theory 1:
  • Bias, Variance, Mean Squared Errors
  • Real life Examples / Case Studies.

Week 5: Statistical Learning Theory 2:
  • Bias-Variance Trade Off and Real life Examples / Case Studies.
  • Comparison between bias and variance of various competing estimation models with case studies

Week 6: Basics of Multivariate data structure and classical Multivariable Linear Models:
  • Regression and Multivariate Regression
  • Gauss-Markov Theorem and its application

Week 7: Multivariate Probability distributions with a stress on Multivariate Normal Distribution:
  • Introduction to the distribution
  • CDF and PDF
  • Normality and explicit expressions for conditionals, Linear And Affine Transformations of Multivariate Normal Distributions
  • Some other common multivariate distributions

Week 8: Multivariate Analysis And Dimensionality:
  • Concept of Unsupervised Learning
  • Hierarchical and Non Hierarchical Clustering and its applications (Real life examples / case studies focusing on what to use and when?)
  • Detailed discussion on K-means clustering
  • Curse of dimensionality
  • Asymptotic behavior of volume and its impact on K-means clustering

Week 9: Population Principal Component Analysis:
  • Basic concept and applications with case studies

Week 10: Sample Principal Component Analysis:
  • Basic concept and applications with case studies

Week 11: Network Data Analysis:
  • Network Data
  • Random Graphs
  • Laws of Large Number For Random Graph

Week 12: Application of random graphs to social network analysis:
  • Some case studies

Books and references

1. The Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
2. Applied Multivariate Statistical Analysis by Dean W. Wichern, Richard A. Johnson, and Richard Johnson
3. linear algebra and learning from data by Gilbert Strang

Instructor bio

Prof. Arindam Banerjee

IIT Kharagpur
Prof. Arindam Banerjee currently an Assistant Professor at the Department of Mathematics, Indian Institute of Technology Kharagpur, India. At IIT Kharagpur I have taught courses in Big Data Analysis, Statistics, Computer Network, Engineering Mathematics, Abstract Algebra and Formal Languages. My research interests are in 1.Combinatorial and Homological Methods in Commutative Algebra and Algebraic Geometry and 2.Application of Algebra, Combinatorics and Statistical Machine Learning in Medical Bioinformatics. Previously I was an Assistant Professor of Mathematics at Ramakrishna Mission Vivekananda Educational and Research Institute, India between 2017-21. Before that I was an Golomb Visiting Assistant Professor at Department of Mathematics, Purdue University, USA 2015-17. I got a Ph.D from the University of Virginia in 2015. I did my M.Math and B.Stat degrees from Indian Statistical Institute between 2003-08.

Course certificate

The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: April 19, 2026 Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

Please note that assignments encompass all types (including quizzes, programming tasks, and essay submissions) available in the specific week.

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Kharagpur .It will be e-verifiable at nptel.ac.in/noc.

Only the e-certificate will be made available. Hard copies will not be dispatched.

Once again, thanks for your interest in our online courses and certification. Happy learning.

- NPTEL team
MHRD logo Swayam logo

DOWNLOAD APP

Goto google play store

FOLLOW US