Machine Perception - SS '19

Recent developments in neural network (aka “deep learning”) have drastically advanced the performance of machine perception systems in a variety of areas including drones, self-driving cars and intelligent UIs. This course is a deep dive into details of the deep learning algorithms and architectures for a variety of perceptual tasks.


eDoz Course Nr.
O. Hilliges
E. Aksan, X. Chen, M. Kaufmann, S. Park, J. Song, A. Spurr, X. Zhang
Thu 10:00 - 12:00, CAB G 61
Thu 13:00 - 15:00, NO C 6
Fri 13:00 - 15:00, NO C 6
The lectures are recorded. Credentials are available in the first week's slides.
Please address all questions regarding content, organisation etc. on Piazza. Please sign up to the forum using this link. The forum is closely monitored by us. Due to organisational reasons, we will not be able to respond to direct e-mails.


Schedule updated (see below).
Pen & Paper Backpropagation: Instructions and solutions updated following Piazza posts 26 and 27
Project descriptions online.
Added link to lecture recordings (first recording will be available within 1-2 days).
Please sign up to Piazza.
Course website online, more information to follow.

Learning Objectives

Students will learn about fundamental aspects of modern deep learning approaches for perception. Students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in learning-based computer vision, robotics and HCI. The final project assignment will involve training a complex neural network architecture and applying it on a real-world dataset.

The core competency acquired through this course is a solid foundation in deep-learning algorithms to process and interpret human input into computing systems. In particular, students should be able to develop systems that deal with the problem of recognizing people in images, detecting and describing body parts, inferring their spatial configuration, performing action/gesture recognition from still images or image sequences, also considering multi-modal data, among others.


Subject to change. Materials only available from within ETH network.

Wk.Date ContentMaterial Exercise Session
1 21.02.

Introduction to class contents & admin

2 28.02.
Deep Learning Introduction

Feedforward Networks, Representation Learning

slides (annotated)
lecture notes
perceptron colab notebook

Tutorial Implement your own MLP

XOR Colab Notebook
Eye-Gaze Colab Notebook
Eye-Gaze Solutions Notebook
3 07.03.
Training & Classification


slides (updated)

Class Tips for Training Part 1

4 14.03.
-- No Class --

Tutorial Linear Regression in TensorFlow

Colab Notebook

Pen & Paper Backpropagation

instructions (updated)
solutions (updated)
5 21.03.
Convolutional Neural Networks
slides (annotated)

Tutorial CNNs in TensorFlow

Colab Notebook

Pen & Paper CNNs

solutions (updated)
6 28.03.
Fully Convolutional Neural Networks


CNNs Pt 2 (updated)
CNNs Pt 2 (annotated)
Fully CNNs

Additional Reading Material:


Class Tips for Training Part 2

slides (pdf)
slides (pptx for animations)
7 04.04.
Recurrent Neural Networks

LSTM, GRU, Backpropagation through time

slides (annotated)

Tutorial RNNs in TensorFlow

Colab Notebook

Pen & Paper RNNs

8 11.04.
Generative Models: Latent Variable Models

Sequence Modelling and Autoencoders.

slides (annotated)
9 18.04.
Generative Models: Latent Variable Models

Variational Autoencoders.

slides (annotated)
10 25.04.
-- No Class (Easter) --
11 02.05.
Generative Models: Implicit Models

Generative Adversarial Networks & Co

slides (annotated)
12 09.05.
Generative Models: Autoregressive Models

PixelCNN, PixelRNN, WaveNet, Stochastic RNNs

slides (annotated)

Additional Reading Material:

PixelRNN PixelCNN
WaveNet VRNN
DeepWriting STCN
Deep Learning Ch. 20.10.7ff.

Pen & Paper Generative Models

13 16.05.
Applications: Eyes
Reinforcement Learning Pt. I
Eyes: slides
RL: slides
RL: slides (extended)
14 23.05.
Reinforcement Learning Pt. II

slides (annotated)
15 30.05.
-- No Class (Ascension Day) --

Exercise Sessions

Please refer to the above schedule for an overview of the planned exercise slots. We will have three different types of activities in the exercise sessions:

  1. Tutorial: Interactive programming tutorial in Python taught by a TA. Code will be made available.
  2. Class: Lecture-style class taught by a TA to give you some tips on how to train your neural network in practice.
  3. Pen & Paper: There will be 4 pen & paper exercises. They are not graded but are helpful to prepare for the written exam. Solutions will be published on the website a week after the release and discussed in the exercise session if desired.

The exercises are meant to help you understand the course's content more in-depth and to prepare you for the graded multi-week project. Exercise sessions are primarily scheduled until week 8 so that you have time to work on the project after week 8.



There will be a multi-week project that gives you the opportunity to have some hands-on experience with training a neural network for a concrete application. The project is to be completed in groups of two and will be graded. The project grade counts 40 % towards your final grade.

The project grade will be determined by two factors: 1) a competitive part based on how well your model fairs compared to your fellow students' models and 2) the idea/novelty/innovativeness of your approach based on a written report to be handed in after the project deadline. For each project there will be baselines available that guarantee a certain grade for the competitive part if you surpass them. The competition will be hosted on a online platform - more details will be announced here.

In the beginning of the course we will provide 4 projects from which your group can choose one. You will have to register for a project around week 5. As the various exercises will prepare you for this project, we do not expect you to work on it before week 8. There are no more activities in the exercise slots after week 8, so that you can use them to work on your project. The final project deadline is 2 weeks after the end of the semester (Friday, June 14th).


  1. Dynamic Gesture Recognition
  2. 3D Human Pose Estimation
  3. Eye Gaze Estimation
  4. Human Motion Prediction


For the final submission, please write a report (using this LaTeX template) and upload your report along with your code. Details about the submission are announced later. We will re-train and re-evaluate your model. Hence, your submission should provide easy-to-follow instructions that let us reproduce your final score on the leaderboard. Please include a readme with the submission with respective instructions how to train and evaluate your model. For submissions for which it is not possible to reproduce the results, grades will be penalized accordingly.


This project constitutes 40% of the final course grade. Project grades will be determined by taking the average of public and private scores. We will provide two baselines. Beating the easy baseline guarantees a grade of 4. As we want to encourage novel solutions and ideas, we also ask you to write a short report (3 pages, excluding references) detailing the model you used for the final submission and your contributions. Depending on this, we will weigh the grade determined by your performance w.r.t. the baselines. In other words, the grade computed as mentioned above can go up or down, depending on the contributions of your project. If you passed the easy baseline, your final grade cannot go lower than 4 after the weighting.


The performance assessment is a written exam (2 hours) conducted during the examination session (Jul-Aug). It will constitute 60 % of the final grade.

To give you a rough idea what to expect for the exam, we release a mock exam which you can download here: