Around the World in 80 Timesteps
Generate 3D camera trajectories based on text prompts
Retrieve 3D human motion videos from text descriptions