Exa - A Glimpse at the Future of Search
Background
Even before Chat-GPT, people in the AI space were beginning to see AI as a competitor to Google's long domination of search.
This is an idea of many moving parts. It wasn't just the no ads, or the common sense, or even how it surfaced real information rather than just site domains.
But moreso an understanding that the internet or even just information has become cluttered.
And cluttered information needs to be sorted.
In Exa's own words:
"Searching the internet should feel like navigating a grand library of knowledge, where you could weave insights across cultures, industries, and millenia. Of course it doesn’t feel that way. Today, searching the internet feels more like navigating a landfill."
And this is the fundamental mission of Exa:
"The core solution is also simple – we need a better search algorithm to filter all that information and organize the knowledge buried inside."
"Exa is going to organize the world’s knowledge."
The Problem
A great example of this problem can be found in this announcement article
"To illustrate the problem, try googling "startups working on climate change"."
"You get 43,800,000 results. That’s a lot of information! But how much actual knowledge? I see many listicle results, but no actual startups working on climate change."
"That's because Google still uses a keyword based algorithm. Keywords as a filtering mechanism may have worked in 1998, but they don't work for an internet with a thousand times more content and an SEO industry devoted to hacking keywords."
"Exa, in contrast, is the first web-scale neural search engine. Our algorithm uses transformers end-to-end, the same technology that built ChatGPT. This enables us to filter the internet by meaning, not by keyword."
"Here's the same search on Exa:"
"Note that these results don't necessarily mention the words "climate change" or "startup". Exa filtered out all the noisy webpages – the listicles talking about startups – and returned the actual knowledge – the startups themselves."
This also doesn't mention the recent damage to search AI has had its own part to play in.
What is Exa
Simply put, it's a direct to consumer webpage for search.
And it already has some extensive API support, including Python, Node and GO SDKs.
But how does this work?
With large language models of course.
Summarising the contents from this blog post.
Exa experimented with a range of architectures and datasets on earlier LLM models like GPT3 (a pre Chat-GPT world) and arrived at a prediction model approach that would mimic how people would talk about a link:
"Found an amazing article I read about the history of Rome’s architecture: [LINK]"
"We trained a neural network to take text like this and predict the link that comes afterward."
But LLM's have their limitations, from hallucinations to stale knowledge. Exa's solution to this was to query the external world with an interesting hypothesis;
"LLMs will soon perform more searches than humans"
There is plenty of cool technical details you can piece together from their blog posts or documentation. But this provides a surface level understanding of Exa's approach to an LLM powered search engine.
There is also likely a Retrieval Augmentation Generation (RAG) part powering this system. RAG has become a popular solution to improving LLM experiences and performance, but just a glance at some of Exa's blog post and documentation really makes you understand what a difficulty scale is.
And there's even mention of the embeddings being realtime, which is interesting.
A good way to understand their scale is through this quote;
"Most idealists don’t have their own million dollar cluster of A100s and the ability to train SOTA self-supervised search models, but we do and we have."
This all isn't to oversimplify their offerings. LLM's ease of use has enabled a lot of run-of-the-mill experiences. But you wont see that here. It is just worth mentioning because as we know, the orders of magnitude always come from many minor improvements.
They even mention reinforcement learning artificial intelligence feedback which is a bleeding edge solution only relevant to those operating at a scale like the big tech companies.
Which means there is a lot of power behind Exa's punch at solving search.
Key Takeways
Exa is already a search engine I prefer on my day to day. It has allowed me to start to reuse search in an optimistic way.
Google (and Bing) will obviously make their own attempts to fix search, but these haven't been going so well.
The API and commercial side of Exa seems very interesting and already well supported. It is something I will look to use in a practical application in the near future.