Data & Code | AIT Lab

Data

Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures

WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild

Learning Locally Editable Virtual Humans

X-Avatar: Expressive Human Avatars

Hi4D: 4D Instance Segmentation of Close Human Interaction

Human-Aware Object Placement for Visual Environment Reconstruction

IM Avatar: Implicit Morphable Head Avatars from Videos

SPEC: Seeing People in the wild with an Estimated Camera

EM-POSE: 3D Human Pose Estimation from Sparse Electromagnetic Trackers

Towards End-to-end Video-based Eye-Tracking

Learning to Assemble: Estimating 6D Poses for Robotic Object-Object Manipulation

Hierarchical Reinforcement Learning Explains Task Interleaving Behavior

Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

Cross-modal deep variational hand pose estimation

Duo-VIO: Fast, Light-weight, Stereo Inertial Odometry

In-air Gestures Around Unmodified Mobile Devices

Code

HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models

PromptHMR: Promptable Human Mesh Recovery

SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel-based Head Avatars

Gaussian Garments: Reconstructing Simulation-ready Clothing with Photorealistic Appearance from Multi-view Video

Grasping Diverse Objects with Simulated Humanoids

VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence

EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Human Hair Reconstruction with Strand-Aligned 3D Gaussians

AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos

ROMEO: Revisiting Optimization Methods for Reconstructing 3D Human-Object Interaction Models From Images

I-Design: Personalized LLM Interior Designer

ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations

WANDR: Intention-guided Human Motion Generation

VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Dream-in-4D: A Unified Approach for Text- and Image-guided 4D Scene Generation

HUGS: Human Gaussian Splats

ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation

Physically Plausible Full-Body Hand-Object Interaction Synthesis

HMP: Hand Motion Priors for Pose and Shape Estimation from Video

FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

AG3D: Learning to Generate 3D Avatars from 2D Image Collections

EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild

Fast-SNARF: A Fast Deformer for Articulated Neural Fields

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

PointAvatar: Deformable Point-based Head Avatars from Videos

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Learning Human-to-Robot Handovers from Point Clouds

GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields

Learning Locally Editable Virtual Humans

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

X-Avatar: Expressive Human Avatars

InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds

Hi4D: 4D Instance Segmentation of Close Human Interaction

TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning

SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

gDNA: Towards Generative Detailed Neural Avatars

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

Human-Aware Object Placement for Visual Environment Reconstruction

IM Avatar: Implicit Morphable Head Avatars from Videos

A Spatio-temporal Transformer for 3D Human Motion Prediction

Human Performance Capture from Monocular Video in the Wild

Render In-between: Motion Guided Video Synthesis for Action Interpolation

VariTex: Variational Neural Face Textures

SPEC: Seeing People in the wild with an Estimated Camera

SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes

PeCLR: Self-Supervised 3D Hand Pose Estimation from monocular RGB via Equivariant Contrastive Learning

Shape-aware Multi-Person Pose Estimation from Multi-view Images

EM-POSE: 3D Human Pose Estimation from Sparse Electromagnetic Trackers

PARE: Part Attention Regressor for 3D Human Body Estimation

Learning To Regress Bodies From Images Using Differentiable Semantic Rendering

Improved Learning of Robot Manipulation Tasks via Tactile Intrinsic Motivation

Learning Functionally Decomposed Hierarchies for Continuous Control Tasks With Path Planning

CoSE: Compositional Stroke Embeddings

Self-Learning Transformations for Improving Gaze and Head Redirection

Convolutional Autoencoders for Human Motion Infilling

Spatial Attention Improves Iterative 6D Object Pose Estimation

ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis

Human Body Model Fitting by Learned Gradient Descent

Towards End-to-end Video-based Eye-Tracking

Detecting Relevance during Decision-Making from Eye Movements for UI Adaptation

VIBE: Video Inference for Human Body Pose and Shape Estimation

Learning to Assemble: Estimating 6D Poses for Robotic Object-Object Manipulation

Hierarchical Reinforcement Learning Explains Task Interleaving Behavior

Content-Consistent Generation of Realistic Eyes with Style

Structured Prediction Helps 3D Human Motion Modelling

Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks

Few-Shot Adaptive Gaze Estimation

Monocular Neural Image Based Rendering with Continuous View Control

Context-Aware Online Adaptation of Mixed Reality Interfaces

Unpaired Pose Guided Human Image Generation

Video-based Prediction of Hand-grasp Preshaping with Application to Prosthesis Control

Demonstration-Guided Deep Reinforcement Learning of Control Policies for Dexterous Human-Robot Interaction

STCN: Stochastic Temporal Convolutional Networks

Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

Deep Pictorial Gaze Estimation

Cross-modal deep variational hand pose estimation

Learning to Find Eye Region Landmarks for Remote Gaze Estimation in Unconstrained Settings

DeepWriting: Making Digital Ink Editable via Deep Generative Modeling

AdaM: Adapting Multi-User Interfaces for Collaborative Environments in Real-Time

Plan3D: Viewpoint and Trajectory Optimization for Aerial Multi-View Stereo Reconstruction

Learning human motion models for long-term predictions

Guiding InfoGAN with Semi-Supervision

Duo-VIO: Fast, Light-weight, Stereo Inertial Odometry