OpenAI Set to Debut Advanced Multimodal AI Model at Upcoming Event

May 12, 2024 – According to a report by The Information, OpenAI has recently showcased a new multimodal AI model to select clients, capable of conducting voice conversations and recognizing objects. Sources suggest that this might be part of what OpenAI plans to officially unveil on May 13th.

This new model reportedly processes image and audio information faster and more accurately compared to OpenAI’s existing independent image recognition and text-to-speech models. For instance, it can potentially aid customer service representatives in better comprehending the tone and inflection of callers, even detecting sarcasm. Hypothetically, the model could also assist students in learning math or translate signs in the real world.

However, despite its advancements, the model, while outperforming GPT-4 Turbo in certain query responses, may still confidently provide incorrect answers, as pointed out by insiders.

Meanwhile, developer Ananay Arora has hinted at a possible phone calling feature for ChatGPT, sharing a screenshot showcasing relevant codes. Moreover, evidence suggests OpenAI is configuring servers for real-time audio and video communication.

CEO Sam Altman has clarified that the upcoming release is not GPT-5, rumored to be significantly more powerful than GPT-4, which The Information reports might debut by the end of this year. Altman also stated that OpenAI has no plans to launch a new AI-powered search engine.

If The Information’s report holds true, OpenAI’s announcement could have an impact on Google’s upcoming I/O developer conference, as Google is also experimenting with AI-based phone calling technology. Furthermore, rumors are swirling about a potential Google project codenamed “Pixie,” a multimodal Google Assistant replacement capable of recognizing objects through device cameras and providing users with information such as purchase locations or instructions on how to use the identified object.

Leave a Reply