Mira Murati's new AI startup, Thinking Machines, is developing "interaction models" designed to enable natural human-AI collaboration. The technology processes continuous streams of audio and video input, moving beyond the text-based exchange that dominates current AI systems.
Murati left her role as Chief Technology Officer at OpenAI to launch Thinking Machines. The interaction models represent a shift toward multimodal AI systems that operate more like human-to-human conversation. Rather than discrete question-and-answer exchanges, these models would maintain ongoing awareness of both audio and visual information flowing from users in real time.
The approach targets a persistent limitation of current AI: the need to reformulate requests or wait for complete responses before continuing dialogue. Natural human collaboration involves constant feedback loops, gesture recognition, and adaptive responses. Interaction models attempt to replicate this fluidity by processing multiple input streams simultaneously.
This work addresses a practical gap in AI usability. ChatGPT and similar tools excel at specific tasks but lack the conversational continuity that makes human teamwork efficient. Video conferencing AI, real-time translation, and collaborative design software all demand this kind of persistent, multimodal awareness.
Thinking Machines hasn't released timelines or specifics about model size, training data, or deployment plans. The announcement confirms the company's focus area but leaves technical details sparse. Murati's background at OpenAI, where she oversaw the transition from GPT-4 to real-time capabilities, suggests the research draws on that experience.
The broader AI industry is moving toward similar goals. Companies including Google, Meta, and others are investing in multimodal systems and real-time AI interactions. Thinking Machines enters a crowded but nascent category where few companies have shipped production-grade solutions.
Whether interaction models become a meaningful product category depends on execution. The concept sounds logical, but building
