2. Introduce the mathematical foundations required for data science
3. Introduce the first level data science algorithms
4. Introduce a data analytics problem solving framework
5. Introduce a practical capstone case study
1. Describe a flow process for data science problems (Remembering)
2. Classify data science problems into standard typology (Comprehension)
3. Develop R codes for data science solutions (Application)
4. Correlate results to the solution approach followed (Analysis)
5. Assess the solution approach (Evaluation)
6. Construct use cases to validate approach and identify modifications required (Creating)
INTENDED AUDIENCE: 4th YEAR ENGG. UNDERGRADUATES
PREREQUISITES: 10 HRS OF PRE-COURSE MATERIAL ON R WILL BE PROVIDED. PARTICIPANTS NEED TO PRACTICE THIS.
INDUSTRY SUPPORT: HONEYWELL, ABB, FORD, GYAN DATA PVT. LTD.
13468 students have enrolled already!!
ABOUT THE INSTRUCTOR:
Prior to joining IIT Madras as a professor, Prof.Rengaswamy was a professor of Chemical Engineering and Co-Director of the Process Control and Optimization Consortium at Texas Tech University, Lubbock, USA. He was also a professor and associate professor at Clarkson University, USA and an assistant professor at IIT Bombay. His major research interests are in the areas of fault detection and diagnosis and development of data science algorithms for manufacturing industries.
Prof.Shankar Narasimhan is currently a professor in the department of Chemical Engineering at IIT Madras. His major research interests are in the areas of data mining, process design and optimization, fault detection and diagnosis and fault tolerant control. He has co-authored several important papers and a book titled Data Reconciliation and Gross Error Detection: An Intelligent Use of Process Data which has received critical appreciation in India and abroad.
TEACHING ASSISTANTS: Dr. Tanneru Hemanth Kumar
Anita Mary COURSE LAYOUT:
Week 1: Course philosophy and introduction to R
Week 2: Linear algebra for data science 1. Algebraic view - vectors, matrices, product of matrix & vector, rank, null space, solution of over-determined set of equations and pseudo-inverse) 2. Geometric view - vectors, distance, projections, eigenvalue decomposition
Statistics (descriptive statistics, notion of probability, distributions, mean, variance, covariance, covariance matrix, understanding univariate and multivariate normal distributions, introduction to hypothesis testing, confidence interval for estimates)
Week 4: Optimization
Week 5: 1. Optimization 2. Typology of data science problems and a solution framework
Week 6: 1. Simple linear regression and verifying assumptions used in linear regression 2. Multivariate linear regression, model assessment, assessing importance of different variables, subset selection
Week 7: Classification using logistic regression
Week 8: Classification using knn and k-means clustering
SUGGESTED READING MATERIALS:
INTRODUCTION TO LINEAR ALGEBRA - BY GILBERT STRANG
APPLIED STATISTICS AND PROBABILITY FOR ENGINEERS – BY DOUGLAS MONTGOMERY
CERTIFICATION EXAM :
The exam is optional for a fee.
Date of Exam: March 31st 2019 (Sunday).
Time of Exam: Morning session 9am to 2 noon; Afternoon session: 2pm to 5pm
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published.
Final score will be calculated as : 25% assignment score + 75% final exam score
25% assignment score is calculated as 25% of average of Best 6 out of 8 assignments
E-Certificate will be given to those who register and write the exam and score greater than or equal to 40% final score. Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Madras. It will be e-verifiable at nptel.ac.in/noc.