Deep Learning for Computer Vision

By Prof. Vineeth N Balasubramanian | IIT Hyderabad

Learners enrolled: 10989 | Exam registration: 1710

ABOUT THE COURSE :

The automatic analysis and understanding of images and videos, a field called Computer Vision, occupies significant importance in applications including security, healthcare, entertainment, mobility, etc. The recent success of deep learning methods has revolutionized the field of computer vision, making new developments increasingly closer to deployment that benefits end users. This course will introduce the students briefly to traditional computer vision topics, before presenting deep learning methods for computer vision. This course delves into the fundamental concepts of neural networks, explores convolutional architectures, and covers the latest advancements in computer vision tasks such as image classification, object detection, segmentation, and generative modeling. Students will engage in hands-on programming assignments, as well as learn to implement and optimize deep learning models. By the end of the course, participants will be equipped with the skills and knowledge to contribute to the rapidly evolving field of computer vision, pushing the boundaries of what machines can perceive and understand.

The course assumes a basic background in machine learning, and may be most useful to students that have also completed basic introductory materials on deep learning.

INTENDED AUDIENCE : Senior undergraduate students, Post-graduate students, Industry professionals seeking to understand computer vision

PREREQUISITES :

Completion of a basic course in Machine Learning
(Recommended, not mandatory) Completion of a course in Deep Learning, or exposure to topics in neural networks
Knowledge of basics in probability, linear algebra, and calculus
Experience of programming in Python

If you are unsure whether you meet the background requirements for the course, please look at Assignment 0 (both theory and programming). If you are comfortable solving/following these assignments, you are ready for the course.

INDUSTRIES SUPPORT : Many organizations – national and international, industry and goverment – use computer vision in their products and services. This includes multinationals such as Google, Microsoft, Apple, Meta, Amazon, Netflix, Honeywell, etc; Indian companies such as Reliance Jio, Flipkart, TCS, Cognizant, L&T, etc; government organizations such as DRDO, Traffic Police Departments, etc; as well as start-ups such as Vehant, Netradyne, SigTuple, etc.

Summary

Course Status :	Completed
Course Type :	Elective
Language for course content :	English
Duration :	12 weeks
Category :	Computer Science and Engineering Artificial Intelligence Data Science
Credit Points :	3
Level :	Undergraduate/Postgraduate
Start Date :	22 Jul 2024
End Date :	11 Oct 2024
Enrollment Ends :	05 Aug 2024
Exam Registration Ends :	16 Aug 2024
Exam Date :	26 Oct 2024 IST

Note: This exam date is subject to change based on seat availability. You can check final exam date on your hall ticket.

Page Visits

Course layout

Week 1: Introduction and Overview

Course Introduction and Overview
History (Optional)
Image Formation (Optional)
Image Representation
Linear Filtering, Correlation, Convolution
Code Walkthroughs

Week 2: Visual Features and Representations

Edge Detection
From Edges to Blobs and Corners
Scale Space, Image Pyramids and Filter Banks
SIFT and Variants
Human Visual System (Optional)
Code Walkthroughs

Week 3: Deep Learning Basics

Neural Networks: A Review
Feedforward Neural Networks and Backpropagation
Gradient Descent and Variants
Regularization in Neural Networks
Improving Training of Neural Networks
Code Walkthroughs

Week 4: Convolutional Neural Networks for Image Classification

Convolutional Neural Networks: An Introduction
Backpropagation in CNNs
CNN Architecture for Image Classification
Code Walkthroughs

Week 5: Beyond Basic CNNs: Architectures, Finetuning and Visualization

Evolution of CNN Architectures: VGG, Inception, ResNets
ResNet Variants, MobileNet, EfficientNet
Finetuning CNNs
Visualizing CNNs
Code Walkthroughs

Week 6: CNNs for Object Detection and Segmentation

CNNs for Object Detection: Two-stage Models
CNNs for Object Detection: Single-stage Models
CNNs for Segmentation
Code Walkthroughs

Week 7: Recurrent Neural Networks and their use in Vision

Recurrent Neural Networks: Introduction
Backpropagation in RNNs
LSTMs and GRUs
Video Understanding using CNNs and RNNs
Code Walkthroughs

Week 8: Attention Models and Transformers

Attention in Vision Models: An Introduction
Soft and Hard Attention: Image Captioning
Self-Attention and Transformers
Code Walkthroughs

Week 9: Vision Transformers and Applications

From Transformers to Vision Transformers
Transformers for Detection
Transformers for Segmentation
Code Walkthroughs

Week 10: Deep Generative Models: GANs and VAEs

Deep Generative Models: An Introduction
Generative Adversarial Networks
GAN Hacks and Improvements
Variational Autoencoders and Disentanglement
Code Walkthroughs

Week 11: Deep Generative Models: Diffusion Models

Introduction to Diffusion Models: DDPMs
Classifier and Classifier-Free Diffusion Guidance
Text-conditioned Diffusion Models
Under the Hood: Sampling, Prediction Space, Noise Schedules, Architectures
Code Walkthroughs

Week 12: Vision-Language Models and Recent Developments

Self-Supervised Learning: SimCLR
Contrastive Learning
Vision-Language Models
CLIP, BLIP, BLIP-2
Code Walkthroughs
Course Conclusion

Additional Material: Miscellaneous Advanced Topics (Additional, Optional)

Applications and Case Studies
Few-shot and Zero-shot Learning
Adversarial Robustness
Pruning and Model Compression
Neural Architecture Search
Recent Developments

From VLMs to MM-LLMS: LLaVA, Video ChatGPT, ChatGPT-4V, Gemini 1.5

Dall-E-1,2,3 + Imagen

Books and references

BOOKS AND REFERENCES

Deep learning is a rapidly evolving field, and we will hence use multiple sources of references, including books, blogs and articles, each of which will be pointed out at the end of each topic.

References for deep learning:

Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, 2016

Michael Nielsen, Neural Networks and Deep Learning, 2016

Yoshua Bengio, Learning Deep Architectures for AI, 2009

References for computer vision:

Richard Szeliski, Computer Vision: Algorithms and Applications, 2010.

Simon Prince, Computer Vision: Models, Learning, and Inference, 2012.

David Forsyth, Jean Ponce, Computer Vision: A Modern Approach, 2002.

Tools: We will use PyTorch for our assignments.

Other useful references:

Bishop, Christopher. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press, 1995. ISBN: 9780198538646.
Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006. ISBN 978-0-387-31073-2
Duda, Richard, Peter Hart, and David Stork. Pattern Classification. 2nd ed. New York, NY: Wiley-Interscience, 2000. ISBN: 9780471056690.
Mitchell, Tom. Machine Learning. New York, NY: McGraw-Hill, 1997. ISBN: 9780070428072.
Richard Hartley, Andrew Zisserman, Multiple View Geometry in Computer Vision, 2004.
David Marr, Vision, 1982.

Instructor bio

Prof. Vineeth N Balasubramanian

IIT Hyderabad

Vineeth N Balasubramanian is an Associate Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Hyderabad (IIT-H). He was also the Founding Head of the Department of Artificial Intelligence at IIT-H from 2019-22, and a Fulbright-Nehru Visiting Faculty at Carnegie Mellon University in 2022-23. His research interests include deep learning, machine learning, and computer vision. His research has resulted in over 160 peer-reviewed publications at various international venues, including top-tier venues such as ICML, CVPR, NeurIPS, ICCV, KDD, AAAI, and IEEE TPAMI, with Best Paper Awards at recent venues such as CODS-COMAD 2022, CVPR 2021 Workshop on Causality in Vision, etc. He served as a General Chair for ACML 2022, and serves as a Senior PC/Area Chair regularly for conferences such as CVPR, ICCV, AAAI, IJCAI and ECCV. He is a recipient of the Google Research Scholar Award (2021), NASSCOM AI Gamechanger Award (2022, both Winner and Runner-up), Teaching Excellence Award at IIT-H (2017 and 2021), Research Excellence Award at IIT-H (2022), among others. For more details, please see https://people.iith.ac.in/vineethnb/.

Course certificate

The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: 26 October 2024 Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

CRITERIA TO GET A CERTIFICATE

● Average assignment score = 25% with 17% from MCQ assignments and 8% from programming assignments
○ Assignment score = 17% of average of best 4 assignments out of the total 6 assignments given in the course.
○ Coding Assignment Score = 8% of average of best 4 assignments out of the total 6 assignments given in the course.
● Exam score = 75% of the proctored certification exam score out of 100.
● Final score = Average assignment score + Exam score.

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Hyderabad .It will be e-verifiable at nptel.ac.in/noc.

Only the e-certificate will be made available. Hard copies will not be dispatched.

Once again, thanks for your interest in our online courses and certification. Happy learning.

- NPTEL team

SWAYAM Helpline / Support