Bandit Algorithm (Online Machine Learning)

By Prof. Manjesh hanawal   |   IIT Bombay
Learners enrolled: 3595
In many scenarios one faces uncertain environments where a-priori the best action to play is unknown. How to obtain best possible reward/utility in such scenarios. One natural way is to first explore the environment and to identify the `best’ actions and exploit them. However, this give raise to an exploration vs exploitation dilemma, where on hand hand we need to do sufficient explorations to identify the best action so that we are confident about its optimality, and on the other hand, best actions need to exploited more number of times to obtain higher reward. In this course we will study many bandit algorithms that balance exploration and exploitation well in various random environment to accumulate good rewards over the duration of play. Bandit algorithms find applications in online advertising, recommendation systems, auctions, routing, e-commerce or in any filed online scenarios where information can be gather in an increment fashion.

Computer Sceince, Electrical Engineering, Operations Research, Mathematics and Statistics
PREREQUISITES Basics of Probability Theory and Optimization
INDUSTRIES  SUPPORT     : All companies related to Internet Technologies (ex. Google, Microsoft, Flipkart, Ola, Amazon, etc.)
Course Status : Completed
Course Type : Elective
Duration : 12 weeks
Start Date : 14 Sep 2020
End Date : 04 Dec 2020
Exam Date : 19 Dec 2020 IST
Enrollment Ends : 25 Sep 2020
Category :
  • Computer Science and Engineering
Credit Points : 3
Level : Postgraduate

Page Visits

Course layout

Week 1:Introduction to Bandit Algorithms. From Batch to Online Setting
Week 2:Adversarial Setting with Full information (Halving, WM Algorithm )
Week 3: Adversarial Setting with Bandit Information
Week 4:Regret lower bounds for adversarial Setting
Week 5:Introduction to Stochastic Setting and various regret notions
Week 6:A primer on Concentration inequalities
Week 7: Stochastic Bandit Algorithms UCB, KL-UCB
Week 8:Lower bounds for stochastic Bandits
Week 9:Introductions to contextual bandits
Week 10:Overview of contextual bandit algorithms
Week 11: Introduction to pure exploration setups (fixed confidence vs budget)
Week 12:Algorithms for pure explorations (LUCB, KL-LUCB, lil’UCB)

Books and references

1) Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
2) Regret Analysis of Stochastic and nonstochastic multi-armed bandit problems by Nicolo Cesa-bianchi and Sebastien Bubeck

Instructor bio

Prof. Manjesh hanawal

IIT Bombay
Manjesh K. Hanawal received the M.S. degree in ECE from the Indian Institute of Science, Bangalore, India, in 2009, and the Ph.D. degree from INRIA, Sophia Antipolis, France, and the University of Avignon, Avignon, France, in 2013. After two years of postdoc at Boston University, he is now an Assistant Professor in Industrial Engineering and Operations Research at the Indian Institute of Technology Bombay, Mumbai, India. His research interests include performance evaluation, machine learning and network economics. He is a recipient of Inspire Faculty Award from DST and Early Career Research Award from SERB

Course certificate

•The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
•The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: 19 December 2020, Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
•Registration url: Announcements will be made when the registration form is open for registrations.
• The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
•Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

• Average assignment score = 25% of average of best 8 assignments out of the total 12 assignments given in the course.
• Exam score = 75% of the proctored certification exam score out of 100
•Final score = Average assignment score + Exam score

•If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.
• Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Bombay. It will be e-verifiable at nptel.ac.in/noc
•Only the e-certificate will be made available. Hard copies will not be dispatched.

MHRD logo Swayam logo


Goto google play store