One of computer vision's most challenging problems is accurately reconstructing human appearance, motion, and body shape. It is a complex task since humans are diverse in all aspects, including our garments and environment. From a technical standpoint, the complex and dynamic human motion is extremely hard to model, as it is actuated, non-linear, and high-dimensional. To this end, we combine geometric models, physical simulation of deformation and light transport, and machine learning to approximate complex functions. Our recent focus is on moving from pure 3D reconstruction to learning models that generalize to new settings, such as driving a reconstructed avatar with new pose inputs in a scene with different illumination conditions or predicting a plausible reaction of the avatar to the interaction with a user.