Vector databases don’t get as much love as their flashier counterparts, large language models (LLMs). But the startups building them are still crucial to the current AI revolution, and investors are eager to back the next big thing — to the tune of hundreds of millions of dollars.
Text, images, and videos are all examples of unstructured data. Vector databases capture and store the essence of a particular piece of data that a machine-learning program or LLM can then pull from.
Without a vector database, which can efficiently catalog vast amounts of unstructured data by their actual content, LLMs would have to rely on a human-generated tag or label when parsing through documents and data.
“You literally cannot use OpenAI on its own,” explained Tim Tully, a partner at Menlo Ventures. “You have to have a vector database because something has to push context into the query to OpenAI. And where does that context come from? Always a vector database.”
As AI startups continue to pique the interest of investors and command sky-high valuations, VCs are hungry to back the “picks and shovels” of AI that operate in the background but are integral to making the tech more powerful and easier for consumers to use.
Vector database startups fit squarely into that remit — and some startups in the space have already raised hundreds of millions of dollars from investors. Early leaders include Pinecone, which Tully has backed, as well as Chroma and Qdrant.
Business Insider has identified seven key players in the vector database arms race. These startups are organized by the amount of VC funding they have received.