You can find me on Twitter @bhutanisanyam1
You can find the Markdown File Here
You can find the Lecture 1 Notes here
Lecture 2 Notes can be found here
Lecture 3 Notes can be found here
Lecture 4 Notes can be found here
These are the Lecture 4 notes for the MIT 6.S094: Deep Learning for Self-Driving Cars Course (2018), Taught by Lex Fridman.
All Images are from the Lecture Slides.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-wuqptwknl8tc3vjqK18zIg.png)
Applying DL to understanding Sense of Human Beings
Focus on Computer Vision.
How can we use CV to extract useful information from Videos (in Context of Cars)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-mmJ5jGVqmvAKrm4ho5GYfQ.png)
Deep Learning for Human Sensing:
Using CV, DL to create systems that operate in the real world.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-qqfzMycxwrQIDt_9Cztpew.png)
Requirements (Ordered according to importance):
- Data:
Enormous amounts of real data is required. Data collection is the hardest and most important part. - Semi-Supervised:
The raw data needs to reduced to meaningful representative cases, raw data needs to be annotated.
We need to collect data and use semi-supervised techniques to find pieces of data that can be used to train our networks - Efficient Annotation:
Good annotation allows good performance.
Annotation techniques for different scenarios are completely different. ex: Annotation tools for glance classification Vs Annotation for Body Pose Estimation Vs Image Pixel level labelling for SegFuse - Hardware:
Large amount of data needs large scale distributed compute and storage. - Algorithms:
We want algorithms that can self calibrate, allowing generalisation. - Temporal Dynamics:
Current algorithms are majorly image based Vs Temporal/Sequence based.
Takeaway: Data Collection, cleaning is more important than algorithms.
Human Imperfections
- Distracted Driving:
3,179 people were killed and 431k+injured in crashes involving distracted driving during 2014. - Eyes off the road:
5 seconds is the average time, your eyes are off the road while texting. - Drunk Driving:
Accountable for 31% of the traffic fatalities of 2014. - Drugged Driving:
23% of night drivers are drugged drivers (2014). - Drowsy Driving:
3% of all traffic fatalities involved a drowsy driver.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-l1jNHtotMQjcLWSLHQU7nQ.png)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-1468ZyNGXnuj72Tk86LaFg.png)
Given these flaws, and the Two Paths to an Autonomous Future (Human Centred Vs Full Autonomy) discussed in Lecture 1:
Is the Human Centred idea a bad idea?
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-PhRYQd6MXemR2lPjFRUbMQ.png)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-aTfNUKS9raV0fPspgnlw5w.png)
- Humans might tend to ‘over-trust; the System.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-lvMvMsVBV2NP9vHTIslwZg.png)
MIT-AVT Naturalistic Driving Dataset
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-uahhtQao9dqrR4E0Gw-Ppg.png)
Data Collection:
- Two+ One Cameras.
- Camera 1: Capturing HD Video of the Face for Glance Recognition and Estimating Cognitive Load.
- Camera 2: (FishEye) Estimating Body Pose-Hands on Wheel, Activity Recognition.
- Camera 3: Recording scenario outside for Full Scene Segmentation.
The Data collected provides an insight of
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-zfu1MrkIMWz__hCE8R6fYQ.png)
- Human Behaviour.
- Deploying Autonomy.
- Design of Algorithms for Training the Deep Neural Nets for Perception Tasks.
Safety Vs Preference for Autopilot?
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-6iuTzNuesz58wXRjHCe6uA.png)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-1Z9Zhz4WU6sBIF_UzOXfvw.png)
- The Dataset shows that the Physical Engagement remains the same with/without the Autopilot on.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-bUrPpYgcKiL8cB32Bvrwmw.png)
- So, Autopilots allow being physically engaged. But the Systems aren’t over-trusted by the Driver
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-3O21-rKMtBxSjyD1QtZFrg.png)
Pedestrian Detection
Challenges:
- Different appearances: Inter class variations.
- Different Articulations.
- Occlusion accessories.
- Pedestrians occluding each other.
Solutions:
The need is to extract features from raw pixels.
Sliding Image:
- Haar Cascade.
- HOG.
- CNN.
More Intelligent netoworks:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-J3ZcUmeOSk9qmjCGce7-UQ.png)
- Fast R-CNN.
- Mask RCNN.
- VoxelNet.
These networks generate the candidates to be considered instead of a sliding window approach, providing a subset to be considered.
- Using a CNN classifier to detect if there is an object of interest present.
- Use Non-Maximum Suppression to remove overlapping bounding boxes.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-g64wW2dfrXrmdgezTryxHQ.png)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-Q3FUbeUhj3g6ZROGYSt4IQ.png)
Data (from different intersections):
- 10 hours recorded every day.
- Aprrox 12,000 Pedestrians Crossing.
- 21M+ samples of feature vector.
- RCNN does a bounding box detection of Pedestrians.
Body Pose Estimation
Includes:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-R4fZvhBFbHO2iXdj7Nv01w.png)
- Finding Joints in the image.
- The landmark points in the image.
Why is it important?
- To determine the alignment of the driver.
- Note: The general airbags are deployed assuming the Driver is facing towards the front.
- With Increased automation, this assumption might fail.
Sequential Detection Approach
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-P9VWaSB3KkDWbstCjjaonA.png)
- Detect the Hands and then detecting in steps, the shoulders and so on.
- Traditional Method
DeepPose Holistic View:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-TCcKOVKmrs-eI8prtj98eA.png)
- Powerful, successful for Muti-person, multi pose detection.
- Performing a regression of Detecting parts from the complete image individually rather than a sequential detection.
- Later, it stitches the detected joints together.
- Allows detection of varying poses, and joints that aren’t visible.
Cascade of Pose Regressors:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-p0bqx7CgH7ThDU35-bfk4w.png)
- CNN that take in a raw image and produce a X-Y position of an estimate of each joint.
- Every estimation zooms in and produces repeated finer detection and estimation of joints.
Part Detection:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-o_rfladg8rL1bpP7i4xTtA.png)
- We can use this approach to detect parts in a picture with multiple people.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-SU87DrD4JR71LvioB7GBLA.png)
- First, the body parts are detected without doing individual person detection first.
- Next connect them together.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-dVpy0PTUa_FPITfGmtjUtQ.png)
- Through Bi-Partitite matching, stitch the different people together.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1--QWhsP8wpoUDeeJpLrWYNg.png)
- This is the approach used by MIT, to detect upper body parts.
- Position of Driver Vs Standard Front facing position.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-wyqqwRx4zbjHOALjhvc2AA.png)
- Plot of Time Vs Position of Neck.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-u3I8yQMb7exZJivUIdzeQA.png)
- Body Pose Estimation for Pedestrians.
- This allows detecting the dynamics of ‘non-verbal’ communication that happen when a pedestrian crosses the road and looks at the vehicle.
- Interesting Discovery: Most people look away from the approaching vehicle before crossing the road.
Glance Classification:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-ZxTaVdPT8kGkss1vtQbQFg.png)
- Determining where the Driver is looking.
- Note: This isn’t the same as gaze detection, where we try to find (x,y,z) pose.
We classify two regions: On-Road/Off-Road.
Or Six Regions:
- On Road
- Off Road
- Left
- Right
- Instrument Panel
- Rear-View Mirror
The classification allows it to be addressed as a ML problem. - The same can be extended to Pedestrian to determine if they are looking at/away from the approaching vehicle.
- Note: The ground data is provided by manual annotation.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-O8KuBcq34V3jNHsFyNpYLA.png)
Face Alignment:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-0Hy-T01Ld4Ta9z2tzlIVHA.png)
- Designing algorithm that are able to detect the individual landmarks of the face and estimating pose of the head.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-85TDbZo-bD8nBFxf4B9XTg.png)
Gaze Classification Pipeline:
- Source Video
- Calibration:
Determining where the sensor is, since its region based. - Video Stabilisation.
- Face Detection.
- Face Alignment.
- Eye/Pupil Detection.
- Head (and eye) pose estimation.
- Classification.
- Decision Pruning.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-0xNgpeLsei22v3K7xKKi2w.png)
Annotation Tooling:
Semi-automated: Data that the network is not confident about are manually annotated.
Fundamental Tradeoff:
What is the accuracy we are willing to put up with?
For increase in accuracy, a human manually iterates and annotates thet data.
False Positives:
Can be dealt with by more training data.
Some degree of human annotation fixes some of the problems.
Driver State Detection:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-6qNc98NdTyjgPFq0Bh0Dzg.png)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-YjSrqH4uECfLQlk5YPT8Fw.png)
Emotion Detection of the driver.
- Many ways to taxonomize emotion.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-OK4R1ekp9qoObnt98sM_bQ.png)
- General Case of detecting emotion.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-OGYJdBmndOlidE5qBF-rVA.png)
- Ex: Affectiva SDK.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-0B8B3Fc8xChDTg1lDkvn5g.png)
- These algorithms map our expressions to Emotion.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-Cah_YxEwl7ZDAxc-RdkNjg.png)
- Application Specific Emotion Recoginiton:
- Ex: Using Voice based GPS Interaction
- Self Annotated.
- The generic emotion detectors fail here because while driving, ‘smile = frustration’.
- Thus Annotation matters. The data must be labelled to reflect these situations.
Cognitive Load:
Degree to which a person is mentally busy.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-uxca9A5A_JJpGb0Zd0MnUQ.png)
- Eyes expand and contract based on Cognitive load, movements reflect deep thought too.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-EsvuSoGY15u1PdFi34nBVw.png)
- Cognitive load can be detected with blink dynamics, eye movement and pupil dilation.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-AtqcxsGtuxzA_Ao6rY27hA.png)
- However, in real world lighting leaves out pupil dilation.
- Blink dynamincs, eye movement are utilised.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-y7LVhCmDl2Epp6s2P6eKLA.png)
- 3D convolutional NN:
- A sequence of images is inputed, we use 3D convolutions.
- Convolve across multiple images/channels.
- Allow learning dynamics through time. - Real World Data:
N-back tasks to estimate cognitive load.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-0TLEq-CpuATdmrsqRJte-w.png)
- We detect face, extract eyes and feed these into a CNN.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-LQ5oQ-dDFZY09iTUYVHjGg.png)
- Plot of Eye Movement Vs Cognitive Load.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-4R_inI6CrH1cyC2TeHHjnw.png)
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-DjNHjYNZifx6BgVW_PZDmg.png)
- Standard 3D CNN Architecture.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-5fOu2J_oV_baZcWnuIm09g.png)
- Accuracy on real world data.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-uhffV4NKSZIMbetVj4im2g.png)
Human Centred Vision for Autonomous Vehicles:
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-pNeqBFESlsDj9OwFFv9jqA.png)
- Even though we are researching on perception, utilising sensors for localisation and path planning. We are still distant from solving this (Argument: 20+ years).
So, Human has to be involved.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-4ZSzm1jhJhFiNez0jDs4Bw.png)
- Thus, the ‘Robot’ needs to understand the ‘Human’s activity’ and the Human-Robot interaction needs to be refined.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-eMF-YoiUGNUYd2CaT0EuBA.png)
- Path to Mass Scale Automation.
(No more steering wheels) - Human Centred Autonomy:
- A SDC is a personal robot rather than a Perception Control System.
- The ‘Transfer of Control’ involves a ‘Personal’ Connection with the machine.
- SDCs will be wide reaching.
![](https://sanyambhutani.com/content/images/downloaded_images/MIT-6-S094--Deep-Learning-for-Self-Driving-Cars-2018-Lecture-5-Notes--Deep-Learning-for-Human-/1-cm0dnyL1v9nrfNi2rejRtg.png)
- Teaser: MIT SDC will debut on the public streets (Public Testing) in March, 2018.
- What Next?
- DeepTraffic
- DeepCrash
- SegFuse
You can find me on twitter @bhutanisanyam1
Subscribe to my Newsletter for updates on my new posts and interviews with My Machine Learning heroes and Chai Time Data Science