May 22 2025 – During an interview with CNBC at Tesla’s Giga Texas facility, Elon Musk reiterated his stance that the Optimus humanoid robot is set to become “one of Tesla’s most pivotal products.” As if to underscore the CEO’s vision, Tesla’s official Optimus social media account unveiled what may well be the most striking demonstration of the robot’s capabilities to date.
In the newly shared video, Optimus showcased a diverse range of skills within a domestic setting. It was seen performing everyday household tasks such as discarding trash, wielding a broom and vacuum cleaner, tearing paper towels, stirring food in a pot, opening cabinets, and closing curtains. Additionally, the robot demonstrated its physical prowess by picking up and placing the front linkage of a Model X onto a cart.
What’s particularly noteworthy is that all these tasks were accomplished using a single neural network. The Tesla Optimus team trained the robot by feeding it first-person video data of humans executing similar tasks, enabling it to learn and replicate these actions directly. This approach holds promise for enabling Optimus to swiftly and reliably acquire and refine new skills.

Milan Kovac, Tesla’s Vice President of the Optimus Project, took to social media to share the team’s latest milestones and future aspirations. He highlighted one of the team’s key objectives: empowering Optimus to learn from internet videos depicting humans performing tasks, encompassing not only first-person perspectives but also third-person views or random camera footage.
“We’ve recently made significant strides in this direction,” Kovac stated. “We’re now capable of directly transferring a substantial amount of knowledge gleaned from human videos to the robot, albeit currently limited to first-person perspectives. This enables us to embark on new tasks more rapidly than relying solely on data obtained through teleoperation, which is inherently more complex.”
He further elaborated that through this process, Optimus is rapidly acquiring a plethora of new skills. These skills can be invoked using natural language, either spoken or textual, and are executed by a single neural network on the robot, facilitating multitasking. Looking ahead, the team plans to expand into transfer learning from third-person videos, encompassing random internet videos, and enhance the robot’s reliability through self-reinforcement learning (RL) in both real-world and synthetic environments (simulations/world models).