April 30, 2026 – DeepSeek is currently conducting grayscale testing of an image recognition mode on both its web and app platforms. This new feature allows users to upload images, enabling DeepSeek to comprehend, describe, and analyze the content, thereby filling a gap in its multimodal capabilities.
Positioned alongside the rapid mode and expert mode as a separate top-level entry point, this indicates that DeepSeek is strategically positioning visual understanding as a core competency rather than just an auxiliary feature.

Some users are already able to use this mode without any issues. However, others can see the entry point but receive a message stating, “The image recognition mode is temporarily unavailable. Please try again later.”
In terms of the product interface, once users enter the image recognition mode, the page displays a prompt to start a conversation using this mode, and an image upload button appears next to the input box.
Based on practical testing, the currently available capabilities mainly focus on image understanding, such as viewing, reading, and analyzing images. This covers scenarios like visual question answering, image interpretation, and screenshot analysis. There is no sign yet of the launch of image generation, video understanding, or cross-modal generation capabilities.
This implies that at this stage, the image recognition mode is more in line with the scope of a visual language model (VLM) rather than a comprehensive multimodal generation tool.
