Machine Perception

Semester:

Spring 2018

Number:

263-3710-00L

Lecturer:

Otmar Hilliges

TA:

Jie Song, Emre Aksan, Seonwook Park, Adrian Spurr, Manuel Kaufmann

Lecture:

Thu 10:00 - 12:00, CAB G 61

Exercise:

Thu 13:00 - 15:00, NO C 6

Credits:

Contact:

Please address all questions (regarding content, organisation etc.) on Piazza. The Piazza forum is closely monitored by us. Due to organisational reasons, we will not be able to respond to direct e-mails. In the beginning of the course you will receive an e-mail with the registration link for Piazza.

Overview

Recent developments in neural network (aka “deep learning”) have drastically advanced the performance of machine perception systems in a variety of areas including drones, self-driving cars and intelligent UIs. This course is a deep dive into details of the deep learning algorithms and architectures for a variety of perceptual tasks.

Announcements

27.07.2018: Released mock exam (see below).

07.03.2018: Schedule update: Note that there is no class on March 15th. Also, in contrast to the previous schedule, a lecture will be held on April 26th. Please check the updated schedule below.

21.02.2018: Project description and relevant papers online

15.02.2018: Course website online

Learning Objectives

Students will learn about fundamental aspects of modern deep learning approaches for perception. Students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in learning-based computer vision, robotics and HCI. The final project assignment will involve training a complex neural network architecture and applying it on a real-world dataset.

The core competency acquired through this course is a solid foundation in deep-learning algorithms to process and interpret human input into computing systems. In particular, students should be able to develop systems that deal with the problem of recognizing people in images, detecting and describing body parts, inferring their spatial configuration, performing action/gesture recognition from still images or image sequences, also considering multi-modal data, among others.

Schedule

Wk.	Date	Content	Slides	Extra Material
1	22.02.	Introduction & Basic Concepts Introduction to class contents & admin	slides	lecture notes
2	01.03.	Image Classification & Input Recognition Support Vector Machines	slides slides (annotated)	lecture notes
3	08.03.	Deep Learning Introduction Feedforward Networks, Representation Learning	slides slides (annotated)	Additional reading material (link to Stanford's course notes): CNNs for Visual Recognition
4	15.03.	No Class
5	22.03.	Layer Types, Training & Classification Backpropagation	slides slides (annotated)
6	29.03.	Convolutional Neural Networks	slides slides (annotated)
7	05.04.	No Class (Easter)
8	12.04.	Recurrent Neural Networks LSTM, GRU, Backpropagation through time	slides slides (annotated)	Additional reading material: Deep Learning Book, chapter on RNNs
9	19.04.	Program Committee
10	26.04.	Fully Convolutional CNNs Segmentation	slides
11	03.05.	Generative Models Pt 1 Variational Autoencoders	slides slides (annotated)
12	10.05.	No class (Ascension Day)
13	17.05.	Generative Models Pt 2 Generative Adversarial Networks	slides slides (annotated)
14	24.05.	Hands & Eyes	slides
15	31.05.	Reinforcement Learning	slides

Exercises

There will be 3 exercises (2 pen-and-papers and 1 programming exercise), which are not graded. The exercises will help you understand the course's content more in-depth and will also prepare you for the graded multi-week project (see below). You will have one week to complete each exercise, after that we release the solutions and discuss it in the exercise session. In some of the exercise sessions there are going to be additional tutorials where we will have some demos and Q&A sessions that will be helpful for the completion of the exercises and the multi-week project. Exercise sessions are only scheduled until week 8, after that the TAs will be present to answer questions. Please find a schedule of the exercise sessions below.

Exercise sheets and solutions will only be accessible from within the ETH network.

Wk.	Date	Content	Material
1	22.02.	Introduction to Azure and TensorFlow Introduction on how to use Microsoft's Azure GPU cluster and example of how to implement a simple linear regression model in TensorFlow.	Tensorflow Tutorial (Azure Notebook)
2	01.03.	Exercise 1 Release of exercise 1 (Support Vector Machines, pen-and-paper)	exercise sheet slides solutions
3	08.03.	Exercise 1 solutions and TensorFlow-Tutorial Discussion of exercise 1 and practical tips on how to train your neural network.	slides animations
4	15.03.	No class
5	22.03.	Exercise 2 and Regularization Release of exercise 2 (Backpropagation, pen-and-paper) and practical tips on how to improve the generalization performance of your neural network.	exercise sheet solutions slides
6	29.03.	Exercise 2 solutions Discussion of exercise 2
7	05.04.	No class
8	12.04.	Exercise 3 and TensorFlow-Tutorial 3 Practical tutorial on how to train a CNN and RNN model in TensorFlow and some additional (optional) programming exercises.	CNN Azure Notebook RNN Azure Notebook
9	19.04.	Exercise 3 Q&A and Project Report Guidelines Q&A session for exercise 3 and some help and guidelines for the report to be handed in at the end of the project.	Report Writing
...	...	...

Project

There will be a multi-week project that gives you the opportunity to have some hands-on experience with training a neural network for a concrete application. The project is to be completed in groups of two and will be graded. The grade counts 50 % towards your final grade.

There are 4 projects available for which you need to register in the beginning of the course (details are to be announced). As the various exercises will prepare you for this project, we do not expect you to work on it before week 8. There are no more activities in the exercise slots after week 8, so that you can use them to work and/or ask questions about your project.

The projects will be hosted on Kaggle, i.e., the grade will be dependent on the performance of your model. There are some baselines available that will guarantee a certain grade if your model outperforms them. You are also asked to hand in a short report (max. 3 pages) describing your best performing model. The report will be considered for your final grade. The final deadline is 2 weeks after the end of the semester (Friday, June 15th). More details will be announced in the lecture and here.

1) Gesture Recognition from Videos

This project involves classification of dynamic hand gestures from multi-modal data including RGB, depth, segmentation mask and skeletal information for videos.

Relevant papers:

Pavlo Molchanov et al. (2016) Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks
Lionel Pigou et al. (2018) Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

2) Hand Joint Recognition

This project tackles the problem of 2D keypoint estimation of human hands. Given an image of a human hand, we would like to infer the 2D location of the joints.

Relevant papers:

Shih-En Wei et al. (2016) Convolutional Pose Machines
Alejandro Newell et al. (2016) Stacked Hourglass Networks for Human Pose Estimation

3) Eye Gaze Estimation

This project concerns the estimation of eye gaze direction. In other words, we try to find in which 3D direction a person's eye is looking towards, as seen from the perspective of a webcam. We work with just single eye images as input and regress two angles to represent eyeball yaw and pitch.

Relevant papers:

Xucong Zhang et al. (2017) MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
Kyle Krafka et al. (2016) Eye Tracking for Everyone
Ashish Shrivastava et al. (2017) Learning from Simulated and Unsupervised Images through Adversarial Training

4) Human Motion Prediction

In this project you are given a sequence of full-body human poses, represented as 3D skeletal data, and the task is to predict how the motion continues for several frames in the future.

Relevant papers:

Katerina Fragkiadaki et al. (2015) Recurrent Network Models for Human Dynamics
Julieta Martinez et al. (2017) On Human Motion Prediction Using Recurrent Neural Networks
Partha Ghosh et al. (2017) Learning Human Motion Models for Long-term Predictions

Case Study

We will have an in-class case study, where we simulate a program committee meeting. This part of the course is optional and will not be graded. However, you will have the chance to discuss a paper that can be relevant for the project. In order to participate in the case study, you will have to register through a form that will be published here in due time.

Exam

The performance assessment is a written exam (2 hours) conducted during the examination session (Jul-Aug). It will constitute 50 % of the final grade.

To give you a rough idea what to expect for the exam, we release a mock exam which you can download here:

Overview

Announcements

Learning Objectives

Schedule

Introduction & Basic Concepts

Image Classification & Input Recognition

Deep Learning Introduction

No Class

Layer Types, Training & Classification

Convolutional Neural Networks

No Class (Easter)

Recurrent Neural Networks

Program Committee

Fully Convolutional CNNs

Generative Models Pt 1

No class (Ascension Day)

Generative Models Pt 2

Hands & Eyes

Reinforcement Learning

Exercises

Introduction to Azure and TensorFlow

Exercise 1

Exercise 1 solutions and TensorFlow-Tutorial

No class

Exercise 2 and Regularization

Exercise 2 solutions

No class

Exercise 3 and TensorFlow-Tutorial 3

Exercise 3 Q&A and Project Report Guidelines

Project

1) Gesture Recognition from Videos

2) Hand Joint Recognition

3) Eye Gaze Estimation

4) Human Motion Prediction

Case Study

Exam