September 13, 2023 – Xiaomi’s Proprietary Sound Recognition Algorithm Achieves Remarkable Breakthrough in Audio Tagging
In a significant leap forward, Xiaomi’s in-house sound recognition algorithm has made substantial strides in the realm of audio tagging tasks. By harnessing the AudioSet-2M dataset, a publicly available collection of audio data, as its training ground, this innovative algorithm has, for the first time, surpassed the 50 mAP (Mean Average Precision) threshold. This breakthrough firmly establishes Xiaomi’s sound recognition algorithm as a global frontrunner in performance.
It’s worth noting that Google divided the AudioSet dataset into three subsets, with the first two subsets serving as the training data and collectively referred to as “AudioSet-2M.” It was within this amalgamated training set that Xiaomi’s sound recognition algorithm model shattered the 50 mAP barrier, setting a new standard for audio tagging technology and positioning itself as the leading model in terms of performance thus far.
Furthermore, Xiaomi has introduced a Mini version of its algorithm model, tailored for scenarios with limited resources. This scaled-down model features a parameter count approximately one-ninth that of the original model, yet it outperforms models from other institutions. This makes Xiaomi’s sound recognition technology even more versatile, allowing it to be seamlessly integrated into a wide range of resource-constrained smart devices.
The implications of this technological breakthrough are far-reaching, as Xiaomi’s sound recognition algorithm is poised to enhance the intelligence of various smart hardware devices significantly. These devices will now be capable of more keenly capturing and identifying environmental sounds, thereby greatly improving the overall smart living experience for users. Specifically, the audio tagging algorithm excels in recognizing a wide array of environmental sounds, such as a baby’s cry, animal noises, car engine sounds, explosions, smoke alarms, doorbell rings, flowing water, and more, enabling these sounds to be expressed in text or other modalities.
Moreover, this algorithmic technology finds extensive applications in the development of Xiaomi’s robots, substantially enhancing their perceptual abilities. The humanoid robot CyberOne, for instance, can identify a staggering 85 types of environmental sounds and perceive six categories encompassing 45 human emotions through auditory cues. On the other hand, Xiaomi’s second-generation biomimetic quadruped robot, CyberDog 2, is equipped to recognize 38 different environmental sounds, enabling it to deliver even more robust and dynamic responses.