Machine Perception - SS '18

Recent developments in neural network (aka “deep learning”) have drastically advanced the performance of machine perception systems in a variety of areas including drones, self-driving cars and intelligent UIs. This course is a deep dive into details of the deep learning algorithms and architectures for a variety of perceptual tasks.


eDoz Course Nr.
O. Hilliges
J. Song, E. Aksan, S. Park, A. Spurr, and M. Kaufmann
Thu 10:00 - 12:00, CAB G 61
Thu 13:00 - 15:00, NO C 6
Please address all questions (regarding content, organisation etc.) on Piazza. The Piazza forum is closely monitored by us. Due to organisational reasons, we will not be able to respond to direct e-mails. In the beginning of the course you will receive an e-mail with the registration link for Piazza.


Released mock exam (see below).
Schedule update: Note that there is no class on March 15th. Also, in contrast to the previous schedule, a lecture will be held on April 26th. Please check the updated schedule below.
Project description and relevant papers online
Course website online

Learning Objectives

Students will learn about fundamental aspects of modern deep learning approaches for perception. Students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in learning-based computer vision, robotics and HCI. The final project assignment will involve training a complex neural network architecture and applying it on a real-world dataset.

The core competency acquired through this course is a solid foundation in deep-learning algorithms to process and interpret human input into computing systems. In particular, students should be able to develop systems that deal with the problem of recognizing people in images, detecting and describing body parts, inferring their spatial configuration, performing action/gesture recognition from still images or image sequences, also considering multi-modal data, among others.


Wk.Date ContentSlides Extra Material
1 22.02.
Introduction & Basic Concepts

Introduction to class contents & admin

slides lecture notes
2 01.03.
Image Classification & Input Recognition

Support Vector Machines

slides (annotated)
lecture notes
3 08.03.
Deep Learning Introduction

Feedforward Networks, Representation Learning

slides (annotated)
Additional reading material (link to Stanford's course notes): CNNs for Visual Recognition
4 15.03.
No Class

5 22.03.
Layer Types, Training & Classification


slides (annotated)
6 29.03.
Convolutional Neural Networks

slides (annotated)
7 05.04.
No Class (Easter)

8 12.04.
Recurrent Neural Networks

LSTM, GRU, Backpropagation through time

slides (annotated)
Additional reading material: Deep Learning Book, chapter on RNNs
9 19.04.
Program Committee

10 26.04.
Fully Convolutional CNNs


11 03.05.
Generative Models Pt 1

Variational Autoencoders

slides (annotated)
12 10.05.
No class (Ascension Day)
13 17.05.
Generative Models Pt 2

Generative Adversarial Networks

slides (annotated)
14 24.05.
Hands & Eyes

15 31.05.
Reinforcement Learning



There will be 3 exercises (2 pen-and-papers and 1 programming exercise), which are not graded. The exercises will help you understand the course's content more in-depth and will also prepare you for the graded multi-week project (see below). You will have one week to complete each exercise, after that we release the solutions and discuss it in the exercise session. In some of the exercise sessions there are going to be additional tutorials where we will have some demos and Q&A sessions that will be helpful for the completion of the exercises and the multi-week project. Exercise sessions are only scheduled until week 8, after that the TAs will be present to answer questions. Please find a schedule of the exercise sessions below.

Exercise sheets and solutions will only be accessible from within the ETH network.

Wk. Date Content Material
1 22.02.
Introduction to Azure and TensorFlow

Introduction on how to use Microsoft's Azure GPU cluster and example of how to implement a simple linear regression model in TensorFlow.

Tensorflow Tutorial (Azure Notebook)
2 01.03.
Exercise 1

Release of exercise 1 (Support Vector Machines, pen-and-paper)

exercise sheet slides solutions
3 08.03.
Exercise 1 solutions and TensorFlow-Tutorial

Discussion of exercise 1 and practical tips on how to train your neural network.

slides animations
4 15.03.
No class

5 22.03.
Exercise 2 and Regularization

Release of exercise 2 (Backpropagation, pen-and-paper) and practical tips on how to improve the generalization performance of your neural network.

exercise sheet
6 29.03.
Exercise 2 solutions

Discussion of exercise 2

7 05.04.
No class

8 12.04.
Exercise 3 and TensorFlow-Tutorial 3

Practical tutorial on how to train a CNN and RNN model in TensorFlow and some additional (optional) programming exercises.

CNN Azure Notebook
RNN Azure Notebook
9 19.04.
Exercise 3 Q&A and Project Report Guidelines

Q&A session for exercise 3 and some help and guidelines for the report to be handed in at the end of the project.

Report Writing
... ... ...


There will be a multi-week project that gives you the opportunity to have some hands-on experience with training a neural network for a concrete application. The project is to be completed in groups of two and will be graded. The grade counts 50 % towards your final grade.

There are 4 projects available for which you need to register in the beginning of the course (details are to be announced). As the various exercises will prepare you for this project, we do not expect you to work on it before week 8. There are no more activities in the exercise slots after week 8, so that you can use them to work and/or ask questions about your project.

The projects will be hosted on Kaggle, i.e., the grade will be dependent on the performance of your model. There are some baselines available that will guarantee a certain grade if your model outperforms them. You are also asked to hand in a short report (max. 3 pages) describing your best performing model. The report will be considered for your final grade. The final deadline is 2 weeks after the end of the semester (Friday, June 15th). More details will be announced in the lecture and here.

1) Gesture Recognition from Videos

This project involves classification of dynamic hand gestures from multi-modal data including RGB, depth, segmentation mask and skeletal information for videos.

Relevant papers:
  1. Pavlo Molchanov et al. (2016) Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks
  2. Lionel Pigou et al. (2018) Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video

2) Hand Joint Recognition

This project tackles the problem of 2D keypoint estimation of human hands. Given an image of a human hand, we would like to infer the 2D location of the joints.

Relevant papers:
  1. Shih-En Wei et al. (2016) Convolutional Pose Machines
  2. Alejandro Newell et al. (2016) Stacked Hourglass Networks for Human Pose Estimation

3) Eye Gaze Estimation

This project concerns the estimation of eye gaze direction. In other words, we try to find in which 3D direction a person's eye is looking towards, as seen from the perspective of a webcam. We work with just single eye images as input and regress two angles to represent eyeball yaw and pitch.

Relevant papers:
  1. Xucong Zhang et al. (2017) MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
  2. Kyle Krafka et al. (2016) Eye Tracking for Everyone
  3. Ashish Shrivastava et al. (2017) Learning from Simulated and Unsupervised Images through Adversarial Training

4) Human Motion Prediction

In this project you are given a sequence of full-body human poses, represented as 3D skeletal data, and the task is to predict how the motion continues for several frames in the future.

Relevant papers:
  1. Katerina Fragkiadaki et al. (2015) Recurrent Network Models for Human Dynamics
  2. Julieta Martinez et al. (2017) On Human Motion Prediction Using Recurrent Neural Networks
  3. Partha Ghosh et al. (2017) Learning Human Motion Models for Long-term Predictions

Case Study

We will have an in-class case study, where we simulate a program committee meeting. This part of the course is optional and will not be graded. However, you will have the chance to discuss a paper that can be relevant for the project. In order to participate in the case study, you will have to register through a form that will be published here in due time.


The performance assessment is a written exam (2 hours) conducted during the examination session (Jul-Aug). It will constitute 50 % of the final grade.

To give you a rough idea what to expect for the exam, we release a mock exam which you can download here: