May 20, 2024


A group of researchers and artificial intelligence (AI) experts are collaborating to develop China’s response to Sora, OpenAI’s highly anticipated text-to-video model.

What it is: Peking University professors and Rabbitpre, an AI company based in Shenzhen, announced their collaboration in a GitHub post on Friday, which they named Open-Sora. The project was facilitated through the Rabbitpre AIGC Joint Lab, a joint effort between the company and the university’s graduate school.

According to the team, Open-Sora aims to “reproduce OpenAI’s video generation model” with a “simple and scalable” repository. The group is seeking assistance from the open-source community for its development.

Progress so far: Using a three-part framework with the components Video VQ-VAE, Denoising Diffusion Transformer and Condition Encoder, the group has successfully generated samples with varying aspect ratios, resolutions and durations for reconstructed videos, including 10- and 18-second clips.

Trending on NextShark: Video of Mexican American toddler crying because she’s not Chinese goes viral

About Sora: Unveiled on Feb. 15, Sora is OpenAI’s first text-to-video model that can instantly create high-quality, realistic videos using only text prompts. So far, durations can last for up to a minute.

[embedded content]

While the technology has been announced, OpenAI said it has no plans to make Sora available for general use anytime soon. The company still needs to address several issues such as reducing misinformation, hateful content and bias, as well as properly labeling the finished product.

Trending on NextShark: Asian restaurant slams influencer’s request for free meal in exchange for exposure

What’s next: Rabbitpre AIGC Joint Lab has outlined some of its future plans for Open-Sora, which include establishing a codebase and training an unconditional model on landscape datasets. Subsequently, the group plans to train models to enhance resolution and duration as part of its primary project stages.

The team also plans to conduct experiments on a text-to-video landscape dataset, train its 1080p resolution (1920 x 1080) model on a video-to-text dataset and develop a control model with additional conditions.

Trending on NextShark: Watch: Chinese makeup artist transforms 57-year-old man into ’27-year-old’

Download the NextShark App:

Want to keep up to date on Asian American News? Download the NextShark App today!

Leave a Reply

Your email address will not be published. Required fields are marked *