X

Deep Learning for Computer Vision

By Prof. Vineeth N Balasubramanian   |   IIT Hyderabad
Learners enrolled: 7233   |  Exam registration: 183
ABOUT THE COURSE :
The automatic analysis and understanding of images and videos, a field called Computer Vision, occupies significant importance in applications including security, healthcare, entertainment, mobility, etc. The recent success of deep learning methods has revolutionized the field of computer vision, making new developments increasingly closer to deployment that benefits end users. This course will introduce the students briefly to traditional computer vision topics, before presenting deep learning methods for computer vision. This course delves into the fundamental concepts of neural networks, explores convolutional architectures, and covers the latest advancements in computer vision tasks such as image classification, object detection, segmentation, and generative modeling. Students will engage in hands-on programming assignments, as well as learn to implement and optimize deep learning models. By the end of the course, participants will be equipped with the skills and knowledge to contribute to the rapidly evolving field of computer vision, pushing the boundaries of what machines can perceive and understand.

The course assumes a basic background in machine learning, and may be most useful to students that have also completed basic introductory materials on deep learning.


INTENDED AUDIENCE Senior undergraduate students, Post-graduate students, Industry professionals seeking to understand computer vision


PREREQUISITES :
  • Completion of a basic course in Machine Learning
  • (Recommended, not mandatory) Completion of a course in Deep Learning, or exposure to topics in neural networks
  • Knowledge of basics in probability, linear algebra, and calculus
  • Experience of programming in Python
If you are unsure whether you meet the background requirements for the course, please look at Assignment 0 (both theory and programming). If you are comfortable solving/following these assignments, you are ready for the course.

INDUSTRIES  SUPPORT : Many organizations – national and international, industry and goverment – use computer vision in their products and services. This includes multinationals such as Google, Microsoft, Apple, Meta, Amazon, Netflix, Honeywell, etc; Indian companies such as Reliance Jio, Flipkart, TCS, Cognizant, L&T, etc; government organizations such as DRDO, Traffic Police Departments, etc; as well as start-ups such as Vehant, Netradyne, SigTuple, etc.

Summary
Course Status : Upcoming
Course Type : Elective
Duration : 12 weeks
Category :
  • Computer Science and Engineering
  • Artificial Intelligence
  • Data Science
Credit Points : 3
Level : Undergraduate/Postgraduate
Start Date : 22 Jul 2024
End Date : 11 Oct 2024
Enrollment Ends : 29 Jul 2024
Exam Registration Ends : 16 Aug 2024
Exam Date : 26 Oct 2024 IST

Note: This exam date is subjected to change based on seat availability. You can check final exam date on your hall ticket.


Page Visits



Course layout

Week 1: Introduction and Overview
  • Course Introduction and Overview
  • History (Optional)
  • Image Formation (Optional)
  • Image Representation
  • Linear Filtering, Correlation, Convolution
  • Code Walkthroughs

Week 2: Visual Features and Representations
  • Edge Detection
  • From Edges to Blobs and Corners
  • Scale Space, Image Pyramids and Filter Banks
  • SIFT and Variants
  • Human Visual System (Optional)
  • Code Walkthroughs

Week 3: Deep Learning Basics
  • Neural Networks: A Review
  • Feedforward Neural Networks and Backpropagation
  • Gradient Descent and Variants
  • Regularization in Neural Networks
  • Improving Training of Neural Networks
  • Code Walkthroughs

Week 4: Convolutional Neural Networks for Image Classification
  • Convolutional Neural Networks: An Introduction
  • Backpropagation in CNNs
  • CNN Architecture for Image Classification
  • Code Walkthroughs

Week 5: Beyond Basic CNNs: Architectures, Finetuning and Visualization
  • Evolution of CNN Architectures: VGG, Inception, ResNets
  • ResNet Variants, MobileNet, EfficientNet
  • Finetuning CNNs
  • Visualizing CNNs
  • Code Walkthroughs

Week 6: CNNs for Object Detection and Segmentation
  • CNNs for Object Detection: Two-stage Models
  • CNNs for Object Detection: Single-stage Models
  • CNNs for Segmentation
  • Code Walkthroughs

Week 7: Recurrent Neural Networks and their use in Vision
  • Recurrent Neural Networks: Introduction
  • Backpropagation in RNNs
  • LSTMs and GRUs
  • Video Understanding using CNNs and RNNs
  • Code Walkthroughs

Week 8: Attention Models and Transformers
  • Attention in Vision Models: An Introduction
  • Soft and Hard Attention: Image Captioning
  • Self-Attention and Transformers
  • Code Walkthroughs

Week 9: Vision Transformers and Applications
  • From Transformers to Vision Transformers
  • Transformers for Detection
  • Transformers for Segmentation
  • Code Walkthroughs

Week 10: Deep Generative Models: GANs and VAEs
  • Deep Generative Models: An Introduction
  • Generative Adversarial Networks
  • GAN Hacks and Improvements
  • Variational Autoencoders and Disentanglement
  • Code Walkthroughs

Week 11: Deep Generative Models: Diffusion Models
  • Introduction to Diffusion Models: DDPMs
  • Classifier and Classifier-Free Diffusion Guidance
  • Text-conditioned Diffusion Models
  • Under the Hood: Sampling, Prediction Space, Noise Schedules, Architectures
  • Code Walkthroughs

Week 12: Vision-Language Models and Recent Developments
  • Self-Supervised Learning: SimCLR
  • Contrastive Learning
  • Vision-Language Models
  • CLIP, BLIP, BLIP-2
  • Code Walkthroughs
  • Course Conclusion

Additional Material: Miscellaneous Advanced Topics  (Additional, Optional)
  • Applications and Case Studies
  • Few-shot and Zero-shot Learning
  • Adversarial Robustness
  • Pruning and Model Compression
  • Neural Architecture Search
  • Recent Developments
    • From VLMs to MM-LLMS: LLaVA, Video ChatGPT, ChatGPT-4V, Gemini 1.5
    • Dall-E-1,2,3 + Imagen

Books and references

BOOKS AND REFERENCES


Deep learning is a rapidly evolving field, and we will hence use multiple sources of references, including books, blogs and articles, each of which will be pointed out at the end of each topic.


References for deep learning:
Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, 2016
Michael Nielsen, Neural Networks and Deep Learning, 2016
Yoshua Bengio, Learning Deep Architectures for AI, 2009


References for computer vision:
Richard Szeliski, Computer Vision: Algorithms and Applications, 2010.
Simon Prince, Computer Vision: Models, Learning, and Inference, 2012.
David Forsyth, Jean Ponce, Computer Vision: A Modern Approach, 2002.


Tools: We will use PyTorch for our assignments.


Other useful references:
  • Bishop, Christopher. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press, 1995. ISBN: 9780198538646.
  • Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006. ISBN 978-0-387-31073-2
  • Duda, Richard, Peter Hart, and David Stork. Pattern Classification. 2nd ed. New York, NY: Wiley-Interscience, 2000. ISBN: 9780471056690.
  • Mitchell, Tom. Machine Learning. New York, NY: McGraw-Hill, 1997. ISBN: 9780070428072.
  • Richard Hartley, Andrew Zisserman, Multiple View Geometry in Computer Vision, 2004.
  • David Marr, Vision, 1982.


Instructor bio

Prof. Vineeth N Balasubramanian

IIT Hyderabad
Vineeth N Balasubramanian is an Associate Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Hyderabad (IIT-H). He was also the Founding Head of the Department of Artificial Intelligence at IIT-H from 2019-22, and a Fulbright-Nehru Visiting Faculty at Carnegie Mellon University in 2022-23. His research interests include deep learning, machine learning, and computer vision. His research has resulted in over 160 peer-reviewed publications at various international venues, including top-tier venues such as ICML, CVPR, NeurIPS, ICCV, KDD, AAAI, and IEEE TPAMI, with Best Paper Awards at recent venues such as CODS-COMAD 2022, CVPR 2021 Workshop on Causality in Vision, etc. He served as a General Chair for ACML 2022, and serves as a Senior PC/Area Chair regularly for conferences such as CVPR, ICCV, AAAI, IJCAI and ECCV. He is a recipient of the Google Research Scholar Award (2021), NASSCOM AI Gamechanger Award (2022, both Winner and Runner-up), Teaching Excellence Award at IIT-H (2017 and 2021), Research Excellence Award at IIT-H (2022), among others. For more details, please see https://people.iith.ac.in/vineethnb/.

Course certificate

The course is free to enroll and learn from. But if you want a certificate, you have to register and write the proctored exam conducted by us in person at any of the designated exam centres.
The exam is optional for a fee of Rs 1000/- (Rupees one thousand only).
Date and Time of Exams: 26 October 2024 Morning session 9am to 12 noon; Afternoon Session 2pm to 5pm.
Registration url: Announcements will be made when the registration form is open for registrations.
The online registration form has to be filled and the certification exam fee needs to be paid. More details will be made available when the exam registration form is published. If there are any changes, it will be mentioned then.
Please check the form for more details on the cities where the exams will be held, the conditions you agree to when you fill the form etc.

CRITERIA TO GET A CERTIFICATE

● Average assignment score = 25% with 17% from MCQ assignments and 8% from programming assignments
       ○ Assignment score = 17% of average of best 4 assignments out of the total 6 assignments given in the course.
       ○ Coding Assignment Score = 8% of average of best 4 assignments out of the total 6 assignments given in the course.
● Exam score = 75% of the proctored certification exam score out of 100.
● Final score = Average assignment score + Exam score.

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Certificate will have your name, photograph and the score in the final exam with the breakup.It will have the logos of NPTEL and IIT Hyderabad .It will be e-verifiable at nptel.ac.in/noc.

Only the e-certificate will be made available. Hard copies will not be dispatched.

Once again, thanks for your interest in our online courses and certification. Happy learning.

- NPTEL team


MHRD logo Swayam logo

DOWNLOAD APP

Goto google play store

FOLLOW US