April 18, 2025 – OpenAI has unveiled a new API service dubbed “Flex Processing Mode” as it intensifies efforts to compete against rivals like Google in the generative AI arena. The move allows users to access its models at reduced rates, albeit with slower response times and occasional resource allocation issues.
The company announced that Flex Processing is currently available in beta for its recently launched o3 and o4-mini inference models, targeting low-priority, non-production tasks such as model evaluation, data augmentation, and asynchronous processing.

By opting for Flex Processing, users can slash their API costs by half. For instance, the o3 model will cost 5permillioninputtokensand20 per million output tokens in Flex mode, compared to the standard rates of 10and40, respectively. Similarly, the o4-mini model’s pricing will drop from 1.10permillioninputtokensand4.40 per million output tokens to 0.55and2.20, respectively.
In an email to customers, OpenAI stated that developers in tiers 1 to 3 must undergo a newly introduced identity verification process to gain access to the o3 model. Additionally, the company noted that enabling inference summarization and streaming API services for o3 and other models also requires completion of the verification process.
OpenAI has previously emphasized that the identity verification mechanism is designed to prevent misuse of its usage policies.