Business Analytics & Text Mining Modeling Using Python

By Dr. Gaurav Dixit   |   IIT Roorkee
Learners enrolled: 4133
Objective of this course is to impart knowledge on use of text mining techniques for deriving business intelligence to achieve organizational goals. Use of Python based software platform to build, assess, and compare models based on real datasets and cases with an easy-to-follow learning curve.

INTENDED AUDIENCE: UG & PG engineering students: all branches MBA students Professionals working in or aspiring for Business Analyst, Data Analyst, Data Scientist, and Data Engineer roles
PREREQUISITES: Relevant sessions from the courses Business Analytics & Data Mining Modelling Using R Parts I and II
INDUSTRY SUPPORT: Big Data companies, Analytics & Consultancy companies, Companies with Analytics Division
Course Status : Completed
Course Type : Elective
Duration : 8 weeks
Start Date : 29 Jul 2019
End Date : 20 Sep 2019
Exam Date : 29 Sep 2019 IST
Category :
  • Management Studies
Credit Points : 2
Level : Undergraduate

Course layout

Week 1: Introductory overview of Text Mining
- Introductory Thoughts 
- Data Mining vs. Text Mining
- Text Mining and Text Characteristics
- Predictive Text Analytics
- Text Mining Problems
- Prediction & Evaluation
- Python as a Data Science Platform
Python for Analytics
- Introduction to Python Installation
- Jupyter Notebook Introduction
Week 2: Python Basics
- Python Programming Features
- Commands for common tasks and control
- Essential Python programming concepts & language mechanics
Built in Capabilities of Python
- Data structures: tuples, lists, dicts, and sets
Week 3: Built in Capabilities of Python
- Functions, Namespaces, Scope, Local functions, Writing more reusable generic functions       
Week 4: Built in Capabilities of Python
- Generators
- Errors & Exception Handling
- Working with files
Numerical Python
- N-dimensional array objects
Week 5: Numerical Python
- Vectorized array operations
- File management using arrays
- Linear algebra operations
- Pseudo-random number generation
- Random walks
Python pandas
- Data structures: Series and DataFrame
Week 6: Python pandas
- Applying functions and methods
- Descriptive Statistics
- Correlation and Covariance
Working with Data in Python
- Working with CSV, EXCEL files
- Working with Web APIs   
Week 7: Working with Data in Python
- Filtering out missing data, Filling in the missing data, removing duplicates
- Perform transformations based on mappings
- Binning continuous variables
- Random sampling and random reordering of rows
- Dummy variables
- String and text processing
- Regular expressions
- Categorical type
Data Visualization using Python
- Matplotlib Library
- Plots & Subplots
Week 8: Text mining modeling using NLTK
- Text Corpus
- Sentence Tokenization
- Word Tokenization
- Removing special Characters
- Expanding contractions
- Removing Stopwords
- Correcting words: repeated characters
- Stemming & lemmatization
- Part of Speech Tagging
- Feature Extraction
- Bag of words model
- TF-IDF model
- Text classification problem
- Building a classifier using support vector machine

Books and references

Fundamentals of Predictive Text Mining by Sholom M. Weiss, Nitin Indurkhya, & Tong Zhang (2010/2015)
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by
Wes McKinney (2017)
Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data by
Dipanjan Sarkar (2016)

Instructor bio

Dr. Gaurav Dixit
is an Assistant Professor in the Department of Management Studies at the Indian Institute of Technology Roorkee. He earned his doctoral degree from the Indian Institute of Management Indore and an engineering degree from Indian Institute of Technology (BHU) Varanasi. Previously, he worked in Hewlett-Packard (HP) as software engineer, and Sharda Group of Institutions as project manager on deputation.
Gaurav’s research focuses on information technology (IT) strategy, electronic commerce, electronic waste, data mining, text mining, and big data analytics and provides insights on business and social value of IT. His research has appeared in quality journals & conferences, including Resources, Conservation and Recycling, Journal of Global Information Technology Management, Sustainable Production and Consumption, Journal of Information Technology Management, ICIS conference, DIGITS conference, India Finance Conference, Indian Academy of Management conference, and Academy of Management conference

Course certificate

  • The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
  • The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
  • Date and Time of Exams: 29 September 2019 , Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
  • Registration url: Announcements will be made when the registration form is open for registrations.
  • The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
  • Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

  • Average assignment score = 25% of average of best 6 assignments out of the total 8 assignments given in the course. 
  • Exam score = 75% of the proctored certification exam score out of 100
  • Final score = Average assignment score + Exam score

  • If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.
  • Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Roorkee. It will be e-verifiable at nptel.ac.in/noc.
  • Only the e-certificate will be made available. Hard copies are being discontinued from July 2019 semester and will not be dispatched

MHRD logo Swayam logo


Goto google play store