Emu Video

Emu Video

Video generation from text prompts

About Emu Video:

Generated by ChatGPT
Emu Video is a tool that focuses on text-to-video generation using explicit image conditioning. It employs diffusion models to factorize the generation process into two steps: generating an image based on a text prompt and then generating a video based on the prompt and the generated image. This factorized approach enables efficient training of high-quality video generation models. Emu Video stands out from previous methods that require a deep cascade of models by only needing two diffusion models to generate 512px, 4-second-long videos at 16fps.The tool provides state-of-the-art results in text-to-video generation when compared to other models such as Make-a-Video (MAV), Imagen-Video (IMAGEN), Align Your Latents (AYL), Reuse & Diffuse (R&D), Cog Video (COG), Gen2 (GEN2), and Pika Labs (PIKA). Human raters have selected Emu Video’s 512 pixels, 16 frames per second, 4-second-long videos as the most convincing ones in terms of quality and faithfulness to the given prompt.Authors of this tool include Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, and Ishan Misra, with equal technical contributions coming from Rohit Girdhar and Mannat Singh. The tool acknowledges the support of multiple collaborators who assisted in the work, providing data and infrastructure. Emu Video also maintains privacy and cookie policies, which can be viewed on their website.