Google Deepmind has connected its Genie 3 world model to Street View imagery, allowing users to drop a pin on any mapped location and explore an AI-generated walkable environment based on real-world data. The system takes Street View photos and converts them into interactive 3D worlds that users can navigate.
This represents a significant convergence of two Google assets. Street View's two-decade collection of global imagery provides massive training data for Genie 3, Google's generative world model trained to understand spatial relationships and visual continuity. By anchoring the model to real locations, Google demonstrates practical applications beyond artistic experiments.
The immediate use case is exploration. Users can visit places they cannot physically reach, revisit locations from different time periods, or experience streets in cities they've never been to. The AI generates plausible continuations of streetscapes based on what it learned during training.
The deeper strategic value targets AI agents and robotics. World models like Genie 3 form the foundation for embodied AI systems that need to understand 3D environments, predict consequences of actions, and navigate physical spaces. By training on Street View data, the model develops realistic priors about how real cities look, how spaces connect, and how environments change from different viewpoints. This knowledge transfers directly to robots operating in human environments.
Google positions Street View as a competitive moat. Competitors lack access to fifteen years of consistent, global street-level imagery at the scale Google has accumulated. The company can continuously improve Genie by feeding it more data from Street View's ongoing collection efforts.
The demo also signals Google's confidence in generative world models as viable technology. Earlier versions of Genie focused on small, synthetic environments. Scaling to real-world Street View locations demonstrates the approach works at practical scope. Whether the AI-generated worlds maintain coherence at larger scales or longer interaction times remains unclear from
