Essentials Of Data Science With R Software - 2: Sampling Theory And Linear Regression Analysis

By Prof. Shalabh   |   IIT Kanpur
Learners enrolled: 1707
Any data analysis is incomplete without statistics. After getting the data, the statistical tools aims to extract the information hidden inside the data. Sampling theory and regression analysis are two important tools among others which play a fundamental role in extracting such information. The role of such classical topics of statistics are to be understood in the context of data science. Such topics have fundamental applicability in data science and are to be understood from computational aspects through software. The introductory tools of sampling theory and regression analysis are detailed in this course. How to use them with the popular free R statistical software R and what are the interpretations of the outcome is the objective of the course to be taught.

UG students of Science and Engineering. Students of humanities with basic mathematical and statistical background can also do it. Working professionals in analytics can also do it.
PREREQUISITES : “Introduction to R Course” and “Essentials of Data Science With R Software – 1 - Probability and Statistical Inference” are preferred. Mathematics background up to class 12 is needed. Some minor statistics background is desirable.
INDUSTRIES  SUPPORT     : All industries having R & D set up will use this course.
Course Status : Completed
Course Type : Elective
Duration : 12 weeks
Category :
  • Mathematics
Credit Points : 3
Level : Undergraduate/Postgraduate
Start Date : 24 Jan 2022
End Date : 15 Apr 2022
Enrollment Ends : 07 Feb 2022
Exam Date : 24 Apr 2022 IST

Note: This exam date is subjected to change based on seat availability. You can check final exam date on your hall ticket.

Page Visits

Course layout

Week 1:Introduction to data science and Calculations with R Software
Week 2:Basic Fundamentals of Sampling
Week 3: Simple Random Sampling
Week 4:Simple Random Sampling with R
Week 5:Stratified Random Sampling
Week 6:Stratified Random Sampling with R
Week 7: Bootstrap Methodology with R
Week 8:Introduction to Linear Models and Regression and Simple linear regression Analysis
Week 9:Simple Linear Regression Analysis with R
Week 10:Multiple Linear Regression Analysis
Week 11: Multiple Linear Regression Analysis with R
Week 12:Variable Selection using LASSO Regression

Books and references

1. Sampling Techniques : W.G. Cochran, Wiley (Low price edition available)
2. Sampling Methodologies and Applications : P.S.R.S. Rao, Chapman and Hall/ CRC
3. An introduction to the bootstrap, Bradley Efron, R.J. Tibshirani, Chapman and Hall/CRC 1994.
4. Introduction to Linear Regression Analysis by Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining (Wiley), Low price Indian edition is available.
5. Applied Regression Analysis by Norman R. Draper, Harry Smith (Wiley), Low price Indian edition is available.
6. Linear Models and Generalizations - Least Squares and Alternatives by C.R. Rao, H. Toutenburg, Shalabh, and C. Heumann (Springer, 2008)
7. Introduction to Statistics and Data Analysis With Exercises, Solutions and Applications in R Authors: Heumann, Christian, Schomaker, Michael, Shalabh, Publisher” Springer 2016
8. The R Software-Fundamentals of Programming and Statistical Analysis -Pierre Lafaye de Micheaux, Rémy Drouilhet, Benoit Liquet, Springer 2013
9. A Beginner's Guide to R (Use R) By Alain F. Zuur, Elena N. Ieno, Erik H.W.G. Meesters, Springer 2009

Instructor bio

Prof. Shalabh

IIT Kanpur
Dr. Shalabh is a Professor of Statistics at IIT Kanpur. His research areas of interest are linear models, regression analysis and econometrics. He has more than 23 years of experience in teaching and research. He has developed several web based and MOOC courses in NPTELincluding on regression analysis and has conducted several workshops on statistics for teachers, researchers and practitioners. He has received several national and international awards and fellowships. He has authored more than 75 research papers in national and international journals. He has written four books and one of the book on linear models is co- authored with Prof. C.R. Rao. 

Course certificate

The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: 24 April 2022 Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.


Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Kanpur .It will be e-verifiable at nptel.ac.in/noc.

Only the e-certificate will be made available. Hard copies will not be dispatched.

Once again, thanks for your interest in our online courses and certification. Happy learning.

- NPTEL team

MHRD logo Swayam logo


Goto google play store