Sora AI Videos

An Up-And-Coming AI Model That Can Generate Your Dreams

Have you ever wanted to see what a bamboo forest in a petri dish with little red pandas running around looked like? Well, recently OpenAI released a video displaying the capabilities of their new text-to-video AI model, Sora. Sora can generate a lifelike 60-second video from a single sentence (1). However, it is still in the research phase and is not released to the public yet. 

Sora works similarly to other OpenAI models, such as ChatGPT, an AI designed for assistance and conversation, and DALL·E 3, an AI designed to generate images from sentences. The technology was trained by an unknown amount of captioned TV shows, videos, real-world footage, and more. The addition of captions enabled Sora to learn about the intricacies of human language and its connection to the physical world (2). 

Photo of a video generated by Sora

Similarly to DALL·E 3, Sora uses diffusion. Sora and other text-to-video AI models start with an encoded text prompt and generate frames step-by-step, incorporating aspects from the text. In every diffusion step, the frames are refined, transforming a grainy noise frame into a detailed picture (5). Sora’s biggest development is that it does not create the video frame-by-frame but rather all at once, which allows the details to stay consistent throughout (2). 

Sora has a great understanding of language and how things are in the physical world. It can generate complex scenes with multiple scenes, motions, varying emotions, and accurate details (1). It can benefit people and society in various ways, especially in the fields of education, accessibility, communication, and creativity. Sora opens up a whole new world to creative expression and problem-solving. Its ability to simulate aspects of the real world can be useful in robotics, self-driving technology, and industries like film and game-making. However, there could be some downsides as it could be abused for harmful or misleading content, such as deepfakes. 

Sora engineers are also working with domain experts on misinformation, hateful content, and bias  (1). Additionally, they are creating tools that detect misleading information, including a program that detects if a video was created by Sora, and adopting similar safety rules to DALL·E 3. These changes will prevent inputs of extreme violence, celebrity likeness, and sexual content. Moreover, Sora cannot detect cause and effect well. For example, if it generates a video of someone biting a cookie, it will display the person biting the cookie but there might not be a bite mark. Additionally, spatial details like left and right orientations and camera trajectory are also a struggle. For example, in the image with the man running on the treadmill, he is running backward (3).

Prompt: Step-printing scene of a person running, cinematic film shot in 35mm (1)

Although there is a general worry about the abuse and safety of Sora, the model is becoming open to Red Teamers. Red Teamers are groups that pretend to be hackers, attempt to physically or digitally intrude on an organization, and report back to the company to improve its defenses. OpenAI is also giving access to some visual artists, designers, and filmmakers to make it more helpful for creative designers (1). 

While Sora showcases remarkable potential in generating high-quality videos, there are important considerations to address. It could be misused, and there are still certain instances, such as cause and effect and direction, where Sora doesn’t work consistently. Currently, Sora is mainly open to Red Teamers and more is necessary to be done before it can be responsibly released to the public. Nevertheless, Sora has the potential to transform fields such as education, accessibility, communication, and creativity, and opens up endless possibilities for problem-solving and creative applications. 

Citations

  1. Open AI. Sora: Creating video from text. (2015). Openai.com. Retrieved from https://openai.com/sora 
  2. Guinness, H. (2024, March 11). What is Sora? Everything you need to know about OpenAI’s new text-to-video model. Zapier.com; Zapier. Retrieved from  https://zapier.com/blog/sora-ai/ 
  3. Remmel, T. (2024, February 28). The future of AI video after Sora is impressive — and flawed. Washington Post; The Washington Post. Retrieved from https://www.washingtonpost.com/technology/interactive/2024/ai-video-sora-openai-flaws 
  4. Schneider, B. (2024, March 14). How to Benefit from Sora – True Interactive. True Interactive. Retrieved from https://trueinteractive.com/blog/how-to-benefit-from-sora/ 
  5. Acharya, A. (2023, August 8). An Introduction to Diffusion Models for Machine Learning. Encord.com; Encord Blog. Retrieved from https://encord.com/blog/diffusion-models/#h3 

Images

  1. https://www.youtube.com/watch?v=HK6y8DAPN_0 
  2. https://www.washingtonpost.com/technology/interactive/2024/ai-video-sora-openai-flaws