Generative AI in the Real World: Chang She on Data Infrastructure for AI

Chang She, a pandas core contributor and former Tubi TV engineer, identified a fundamental problem: the traditional data stack fails under AI workloads. The existing infrastructure optimizes for analytics and business intelligence, not the demands of machine learning pipelines that require rapid iteration, complex feature engineering, and seamless integration with vector operations.

She founded LanceDB to address this gap. Vector databases alone represent an incomplete solution. They handle embeddings well but ignore the broader ecosystem AI systems require: data versioning, feature stores, efficient filtering across structured and unstructured data, and low-latency access patterns that differ sharply from traditional OLAP databases.

The core issue centers on data preparation. AI teams spend disproportionate time munging raw data into formats compatible with model training. Pandas works for small datasets but chokes at scale. Parquet improves compression and columnar access, but the pipeline still involves multiple conversion steps and data duplication. LanceDB rethinks this architecture by building a database optimized for how AI actually operates: storing vectors alongside metadata, enabling fast similarity searches while preserving relational structure, and supporting rapid schema evolution as feature definitions change.

She argues that siloing vectors in specialized databases creates operational friction. Production systems need to join embeddings with source data, filter results by metadata, and version datasets for reproducibility. A fragmented stack means more glue code, more failure points, more latency.

This reflects a broader shift in infrastructure. Companies like Anthropic and OpenAI revealed that data quality and curation matter as much as model architecture. Enterprises now struggle with governance: tracking which data versions trained which models, updating features without retraining, and maintaining audit trails for compliance.

LanceDB targets this pain point by offering a unified surface for vector and structured data operations. Rather than forcing teams to orchestrate Postgres plus a vector database plus a feature store, She

Generative AI in the Real World: Chang She on Data Infrastructure for AI

What the jury will actually decide in the case of Elon Musk vs. Sam Altman

Elon Musk’s SpaceXAI has been bleeding staff since its merger

Fired hacker twins forget to end Teams recording, capture own crimes

Get Daily AIWireDaily