ChatGPT Maker Shocks Tech World With Realistic Text-to-Video AI

OpenAI, the maker of the popular chatbot ChatGPT, has just made a tidal wave in the industry with its latest development.

On Thursday, the company revealed its latest software, named Sora, a text-to-video AI that allows users to create HD-quality video clips by inputting nothing more than a text description.

As OpenAI notes on its website, SORA is currently limited to making clips of one minute or less. Nevertheless, the quality goes far beyond that of OpenAI’s flagship text-to-image generator, Dall-E. Though imperfections and tells of AI generation remain for now (as the company acknowledges), the visuals produced by SORA are hyper-realistic and on casual glance can pass for real-life camera-recorded imagery. The software is also capable of rendering video in other non-realistic styles that resemble animation.

Sam Altman, OpenAI’s founder and CEO, published several posts on X in which he invited users to provide him with prompts in order to demonstrate SORA’s capabilities. He then shared the results of those prompts:

For the moment, it is exclusively accessible to a limited cohort of safety testers referred to as “red teamers.” These testers assess the model for vulnerabilities, particularly in areas such as misinformation and bias.

As NBC News reports, OpenAI has announced the development of a “detection classifier” designed to recognize video clips generated by Sora. The company intends to incorporate specific metadata into the output, which will aid in identifying AI-generated content. This metadata aligns with Meta’s efforts to employ a similar strategy for identifying AI-generated images during this election year.

The outlet further notes of the highly competitive AI landscape:

With Sora, OpenAI is looking to compete with video-generation AI tools from companies such as Meta and Google, which announced Lumiere in January. Similar AI tools are available from other startups, such as Stability AI, which has a product called Stable Video Diffusion. Amazon has also released Create with Alexa, a model that specializes in generating prompt-based short-form animated children’s content.

… OpenAI, backed by Microsoft , has made multimodality — the combining of text, image and video generation — a goal in its effort to offer a broader suite of AI models.

“The world is multimodal,” OpenAI COO Brad Lightcap told CNBC in November. “If you think about the way we as humans process the world and engage with the world, we see things, we hear things, we say things — the world is much bigger than text. So to us, it always felt incomplete for text and code to be the single modalities, the single interfaces that we could have to how powerful these models are and what they can do.”

News of the massive leap in AI capabilities as seen in Sora is fueling fears of myriad ways in which the technology could conceivably be abused if safeguards are not put in place.

Danger lies, for example, in the realms of misinformation, manipulation, and privacy invasion. As these systems become more sophisticated, there is an increased risk of generating realistic but fabricated content, leading to the spread of false information and the manipulation of public opinion.

Additionally, malicious actors could exploit this technology to create convincing deep-fake videos for fraudulent purposes. The ease of generating fake videos could also escalate privacy concerns, as individuals may find it challenging to discern genuine content from manipulated ones, raising ethical and societal implications.

Current legal and ethical frameworks haven’t fully caught up with evolving AI technologies, raising questions about accountability and misuse. It is questionable whether legislators can ever fully keep up, given the exponential rate of AI development in contrast to the stagnation typical of most legislatures.

Nevertheless, text-to-video AI holds significant promise in enhancing creativity, communication, and accessibility. This technology enables the relatively rapid transformation of textual information into engaging visual content, potentially fostering creativity and innovation in various fields. By automating the video-creation process, individuals without extensive video-editing skills can easily communicate their ideas, making content production more accessible to a wider audience.

Detractors of the technology fear this will eliminate jobs for people such as visual artists and VFX personnel. To some degree, that might be true, just as advances in other fields, such as automated manufacturing, have caused the disappearance of jobs that became obsolete.

Still, nothing prevents those who are passionate about artistic endeavors from practicing their craft simply because it is outdated, just as artists continue to learn to paint realistic humans even though photography long ago made portrait-making obsolete, and equestrians continue to practice horseback riding even though the horse was replaced for commercial purposes by automobiles, trains, and airplanes.

Moreover, AI media production could prove a useful way for non-establishment voices to finally gain greater representation, whereas currently the Left largely controls film studios, television networks, and publishing houses.

Ultimately, AI, like most technologies, is a double-edged sword that can — and will — be used for both good and ill.