Business Analytics & Text Mining Modeling Using Python

By Prof. Gaurav Dixit   |   IIT Roorkee
Learners enrolled: 6498   |  Exam registration: 1328
Objective of this course is to impart knowledge on use of text mining techniques for deriving business intelligence to achieve organizational goals. Use of Python based software platform to build, assess, and compare models based on real datasets and cases with an easy-to-follow learning curve.

INTENDED AUDIENCE : UG & PG engineering students: all branches MBA students Professionals working in or aspiring for Business Analyst, Data Analyst, Data Scientist, and Data Engineer roles

PREREQUISITES : Relevant sessions from the courses Business Analytics & Data Mining Modelling Using R Parts I and II

INDUSTRY SUPPORT : Big Data companies, Analytics & Consultancy companies, Companies with Analytics Division
Course Status : Completed
Course Type : Elective
Duration : 8 weeks
Category :
  • Management Studies
Credit Points : 2
Level : Undergraduate
Start Date : 24 Jul 2023
End Date : 15 Sep 2023
Enrollment Ends : 07 Aug 2023
Exam Registration Ends : 21 Aug 2023
Exam Date : 24 Sep 2023 IST

Note: This exam date is subjected to change based on seat availability. You can check final exam date on your hall ticket.

Page Visits

Course layout

Week 1: Introductory overview of Text Mining
- Introductory Thoughts 
- Data Mining vs. Text Mining
- Text Mining and Text Characteristics
- Predictive Text Analytics
- Text Mining Problems
- Prediction & Evaluation
- Python as a Data Science Platform
Python for Analytics
- Introduction to Python Installation
- Jupyter Notebook Introduction
Week 2: Python Basics
- Python Programming Features
- Commands for common tasks and control
- Essential Python programming concepts & language mechanics
Built in Capabilities of Python
- Data structures: tuples, lists, dicts, and sets
Week 3: Built in Capabilities of Python
- Functions, Namespaces, Scope, Local functions, Writing more reusable generic functions       
Week 4: Built in Capabilities of Python
- Generators
- Errors & Exception Handling
- Working with files
Numerical Python
- N-dimensional array objects
Week 5: Numerical Python
- Vectorized array operations
- File management using arrays
- Linear algebra operations
- Pseudo-random number generation
- Random walks
Python pandas
- Data structures: Series and DataFrame
Week 6: Python pandas
- Applying functions and methods
- Descriptive Statistics
- Correlation and Covariance
Working with Data in Python
- Working with CSV, EXCEL files
- Working with Web APIs   
Week 7: Working with Data in Python
- Filtering out missing data, Filling in the missing data, removing duplicates
- Perform transformations based on mappings
- Binning continuous variables
- Random sampling and random reordering of rows
- Dummy variables
- String and text processing
- Regular expressions
- Categorical type
Data Visualization using Python
- Matplotlib Library
- Plots & Subplots
Week 8: Text mining modeling using NLTK
- Text Corpus
- Sentence Tokenization
- Word Tokenization
- Removing special Characters
- Expanding contractions
- Removing Stopwords
- Correcting words: repeated characters
- Stemming & lemmatization
- Part of Speech Tagging
- Feature Extraction
- Bag of words model
- TF-IDF model
- Text classification problem
- Building a classifier using support vector machine

Books and references

  1. Fundamentals of Predictive Text Mining by Sholom M. Weiss, Nitin Indurkhya, & Tong Zhang (2010/2015)
  2. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by
  3. Wes McKinney (2017)
  4. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data by
  5. Dipanjan Sarkar (2016)

Instructor bio

Prof. Gaurav Dixit

IIT Roorkee
Dr. Gaurav Dixit is an Assistant Professor in the Department of Management Studies at the IndianInstitute of Technology Roorkee. He earned his doctoral degree from the Indian Institute ofManagement Indore and an engineering degree from Indian Institute of Technology (BHU) Varanasi.Previously, he worked in Hewlett-Packard (HP) as software engineer, and Sharda Group ofInstitutions as project manager on deputation.Gaurav’s research focuses on information technology (IT) strategy, electronic commerce, electronicwaste, data mining, and big data analytics and provides insights on business and social value of IT.His research has appeared in quality journals & conferences, including Resources, Conservation andRecycling, Journal of Global Information Technology Management, Sustainable Production andConsumption, Journal of Information Technology Management, DIGITS conference, India FinanceConference, Indian Academy of Management conference, and Academy of Management conference.

Course certificate

The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: 
24 September 2023 Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.


Average assignment score = 25% of average of best 6 assignments out of the total 8 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Roorkee.It will be e-verifiable at nptel.ac.in/noc.

Only the e-certificate will be made available. Hard copies will not be dispatched.

Once again, thanks for your interest in our online courses and certification. Happy learning.

- NPTEL team

MHRD logo Swayam logo


Goto google play store