May 12, 2024 – According to a recent report by The Information, OpenAI has showcased a new multimodal AI model to select clients, capable of conducting speech dialogue and recognizing objects. Sources suggest that this might be part of OpenAI’s planned announcement for May 13th.
The new model is reported to process image and audio information faster and more accurately compared to OpenAI’s existing independent image recognition and text-to-speech models. For instance, it could potentially assist customer service representatives in better understanding callers’ tones and detecting sarcasm. Hypothetically, the model could also help students learn math or translate real-world signs.
However, insiders caution that while the model might surpass GPT-4 Turbo in answering certain queries, it can still confidently provide incorrect answers.
Meanwhile, developer Ananay Arora has hinted at the possibility of ChatGPT gaining phone functionality, sharing a screenshot containing call-related code. Evidence also suggests that OpenAI is configuring servers for real-time audio and video communication.
CEO Sam Altman has denied rumors that the upcoming release is GPT-5, stating that it might debut later this year. He also clarified that OpenAI won’t be launching a new AI-powered search engine.
If The Information’s report is accurate, OpenAI’s announcement could impact the upcoming Google I/O developer conference, as Google is also experimenting with AI-powered phone call technology. Additionally, Google is rumored to be working on a multimodal Google Assistant replacement called “Pixie,” which can recognize objects through device cameras and offer contextual information.