Deep Learning for Computer Vision

By Prof. Vineeth N Balasubramanian   |   IIT Hyderabad
Learners enrolled: 8559
The automatic analysis and understanding of images and videos, a field called Computer Vision, occupies significant importance in applications including security, healthcare, entertainment, mobility, etc. The recent success of deep learning methods has revolutionized the field of computer vision, making new developments increasingly closer to deployment that benefits end users. This course will introduce the students to traditional computer vision topics, before presenting deep learning methods for computer vision. The course will cover basics as well as recent advancements in these areas, which will help the student learn the basics as well as become proficient in applying these methods to real-world applications. The course assumes that the student has already completed a full course in machine learning, and some introduction to deep learning preferably, and will build on these topics focusing on computer vision.

INTENDED AUDIENCE :  Senior undergraduate students + post-graduate students
  • Completion of a basic course in Machine Learning
  • (Recommended, but not mandatory) Completion of a course in Deep Learning, or exposure to topics in neural networks
  • Knowledge of basics in probability, linear algebra, and calculus
  • Experience of programming, preferably in Python
If you are unsure whether you meet the background requirements for the course, please look at Assignment 0 (both theory and programming). If you are comfortable solving/following these assignments, you are ready for the course.

INDUSTRIES  SUPPORT     : All companies that use computer vision for their products/services (Microsoft, Google, Facebook, Apple, TCS, Cognizant, L&T, etc)
Course Status : Completed
Course Type : Elective
Duration : 12 weeks
Start Date : 14 Sep 2020
End Date : 04 Dec 2020
Exam Date : 20 Dec 2020
Enrollment Ends : 25 Sep 2020
Category :
  • Computer Science and Engineering
  • Artificial Intelligence
  • Data Science
Level : Undergraduate/Postgraduate

Course layout

Week 1:Introduction and Overview: 
Course Overview and Motivation; Introduction to Image Formation, Capture and Representation; Linear Filtering, Correlation,          Convolution
Week 2:Visual Features and Representations: 
Edge, Blobs, Corner Detection; Scale Space and Scale Selection; SIFT, SURF; HoG, LBP, etc.
Week 3:Visual Matching:
Bag-of-words, VLAD; RANSAC, Hough transform; Pyramid Matching; Optical Flow
Week 4:Deep Learning Review:
Review of Deep Learning, Multi-layer Perceptrons, Backpropagation
Week 5:Convolutional Neural Networks (CNNs):
Introduction to CNNs; Evolution of CNN Architectures: AlexNet, ZFNet, VGG, InceptionNets, ResNets, DenseNets
Week 6:Visualization and Understanding CNNs:
Visualization of Kernels; Backprop-to-image/Deconvolution Methods; Deep Dream, Hallucination, Neural Style Transfer;  CAM,Grad-CAM, Grad-CAM++; Recent Methods (IG, Segment-IG, SmoothGrad)
Week 7:CNNs for Recognition, Verification, Detection, Segmentation:
CNNs for Recognition and Verification (Siamese Networks, Triplet Loss, Contrastive Loss, Ranking Loss); CNNs for Detection: Background of Object Detection, R-CNN, Fast R-CNN, Faster R-CNN, YOLO, SSD, RetinaNet; CNNs for Segmentation: FCN, SegNet, U-Net, Mask-RCNN
Week 8:Recurrent Neural Networks (RNNs): 
Review of RNNs; CNN + RNN Models for Video Understanding: Spatio-temporal Models, Action/Activity Recognition
Week 9:Attention Models:
Introduction to Attention Models in Vision; Vision and Language: Image Captioning, Visual QA, Visual Dialog; Spatial Transformers; Transformer Networks
Week 10:Deep Generative Models:
Review of (Popular) Deep Generative Models: GANs, VAEs; Other Generative Models: PixelRNNs, NADE, Normalizing Flows, etc
Week 11:Variants and Applications of Generative Models in Vision: 
Applications: Image Editing, Inpainting, Superresolution, 3D Object Generation, Security; Variants: CycleGANs, Progressive GANs, StackGANs, Pix2Pix, etc
Week 12:Recent Trends:
Zero-shot, One-shot, Few-shot Learning; Self-supervised Learning; Reinforcement Learning in Vision; Other Recent Topics and Applications

Books and references

Deep learning is a rapidly evolving field, and we will hence use multiple sources of references, including books, blogs and articles, each of which will be pointed out at the end of each topic. 

References for deep learning:

Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, 2016
Michael Nielsen, Neural Networks and Deep Learning, 2016
Yoshua Bengio, Learning Deep Architectures for AI, 2009

References for computer vision:

Richard Szeliski, Computer Vision: Algorithms and Applications, 2010.
David Forsyth, Jean Ponce, Computer Vision: A Modern Approach, 2002.


We will use PyTorch for our assignments.

Other useful references:
Bishop, Christopher. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press, 1995. ISBN: 9780198538646.
Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006. ISBN 978-0-387-31073-2
Duda, Richard, Peter Hart, and David Stork. Pattern Classification. 2nd ed. New York, NY: Wiley-Interscience, 2000. ISBN: 9780471056690.
Mitchell, Tom. Machine Learning. New York, NY: McGraw-Hill, 1997. ISBN: 9780070428072.
Richard Hartley, Andrew Zisserman, Multiple View Geometry in Computer Vision, 2004.
David Marr, Vision, 1982.

Instructor bio

Prof. Vineeth N Balasubramanian

IIT Hyderabad
Vineeth N Balasubramanian is an Associate Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Hyderabad (IIT-H), as well as serves as the Head of the Department of Artificial Intelligence at IIT-H. His research interests include deep learning, machine learning, and computer vision. His research has been published at premier peer-reviewed venues including ICML, CVPR, ICCV, KDD, ICDM, IEEE TPAMI and ACM MM. His PhD dissertation at Arizona State University on the Conformal Predictions framework was nominated for the Outstanding PhD Dissertation at the Department of Computer Science. He is an active reviewer/contributor at many conferences such as NeurIPS, CVPR, ICCV, AAAI, IJCAI, ACM MM with a recent award as an Outstanding Reviewer at CVPR 2019, as well as journals including IEEE TPAMI, IEEE TNNLS, JMLR and Machine Learning. He currently serves as the Secretary of the AAAI India Chapter. For more details, please see https://iith.ac.in/~vineethnb/.

Course certificate

•The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
• The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: 20 December 2020, Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm. 
• Registration url: Announcements will be made when the registration form is open for registrations.
• The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
• Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

• Average assignment score = 10% weightage: Weekly quizzes (best 9 of 12) 15% weightage: Programming assignments (best 4 of 6) 
• Exam score = 75% of the proctored certification exam score out of 100
• Final score = Average assignment score + Exam score

• If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.
• Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Madras. It will be e-verifiable at nptel.ac.in/noc.
• Only the e-certificate will be made available. Hard copies will not be dispatched.

MHRD logo Swayam logo


Goto google play store