Language models default to predictable outputs because they're optimized for patterns that appear most frequently in training data, not for genuine randomness or diversity. This constraint extends beyond simple number games to affect how these systems handle complex reasoning, creative tasks, and problem-solving across domains.

A startup is working to break this pattern through a new approach to model inference. Rather than accepting the default probability distributions that models generate, the technique nudges systems toward less obvious but still valid outputs. The method doesn't retrain models or alter their weights. Instead, it adjusts how the model samples from its own probability space during generation, encouraging exploration beyond the most likely token sequences.

The problem runs deeper than randomness. Language models exhibit groupthink because they're trained on internet text where certain phrasings, perspectives, and solution approaches dominate. When scaled up, this creates systems that converge on similar answers to similar questions. Ask multiple instances of GPT-4 to brainstorm product ideas and you'll see repetition. The models aren't being creative. They're reflecting statistical artifacts baked into training data.

This matters for real applications. In ideation, research, coding, and strategy work, diversity of output drives better outcomes. A financial analyst needs multiple valid market interpretations, not eight versions of the same thesis. A software team benefits from varied architectural approaches, not five identical designs.

The startup's approach appears to work by reweighting the probability distributions models naturally produce, systematically reducing preference for high-frequency tokens without breaking coherence. Early results show outputs become more varied while maintaining quality and relevance.

The broader implication challenges how we think about model capability. Raw capability hasn't plateaued, but our ability to extract that capability beyond default behavior has. If models can generate competent outputs they rarely choose to produce, the bottleneck isn't intelligence. It's sampling strategy.

This won't solve every model limitation