A Basic open source AI search engine, modeled after Perplexity.ai. If you’re not familiar with an AI-powered question-answering platform, they use a large language model like ChatGPT to answer your questions, but improves on ChatGPT in that it pulls in accurate and real-time search results to supplement the answer (so no “knowledge cutoff”). And lists citations within the answer itself which builds confidence it’s not hallucinating and allows you to research topics further.
- Clone / download repo
- Go get your API keys and add them to
search.php
(look for "[Fill me in]") - Run locally using php (php -S localhost:8000)
The main challenge with LLMs like ChatGPT is that they have knowledge cutoffs (and they occasionally tend to hallucinate). It’s because they’re trained on data up to a specific date (eg Sep 2021). So if you want an answer to an up-to-date question or you simply want to research a topic in detail, you’ll need to augment the answer with relevant sources. This technique is known as RAG (retrieval augmented generation). And in our case we can simply supply the LLM up-to-date information from search engines like Google or Bing.
To build this yourself, you’ll want to first sign up for an API key from Bing, Google (via Serper), Brave, or others. Bing, Brave, and Serper all offer free usage to get started.
In search.php
, put your API key where appropriate (look for "[Fill me in]"). For this example, I'm have code for both Brave and Google via Serper.
Here, you’ll need to sign up for an API key from an LLM provider. There’s a lot of providers to choose from right now. For example there’s OpenAI, Anthropic, Anyscale, Groq, Cloudflare, Perplexity, Lepton, or the big players like AWS, Azure, or Google Cloud. I’ve used many of these with success and they offer a subset of current and popular closed and open source models. And each model has unique strengths, different costs, and different speeds. For example, gpt-4 is very accurate but expensive and slow. When in doubt, I’d recommend using chatgpt-3.5-turbo from OpenAI. It’s good enough, cheap enough, and fast enough to test this out.
Fortunately, most of these LLM serving providers are compatible with OpenAI’s API format, so switching to another provider / model is only minimal work (or just ask a chatbot to write the code!).
In search.php
, put your API keys where appropriate (look for "[Fill me in]"). For this example, I'm using OpenAI (for chatgpt-3.5-turbo / gpt-4) and Groq (for Mixtral-8b7b). So to keep your work minimal, just go get keys for one or both of those.
When you want to ask an LLM a question, you can provide a lot of additional context. Each model has its own unique limit and some of them are very large. For gpt-4-turbo, you could pass along the entirety of the 1st Harry Potter book with your question. Google’s super powerful Gemini 1.5 can support a context size of over a million tokens. That’s enough to pass along the entirety of the 7-book Harry Potter series!
Fortunately, passing along the snippets of 8-10 search results is far smaller, allowing you to use many of the faster (and much cheaper) models like gpt-3.5-turbo or mistral-7b.
In my experience, passing along the user question, custom prompt message, and search result snippets are usually under 1K tokens. This is well under even the most basic model’s limits so this should be no problem.
search.php
has the sample prompt I’ve been playing around with you. Hat-tip to the folks at Lepton AI who open-sourced a similar project which helped me refine this prompt.
One of the nice features of Perplexity is how they suggest follow up questions. Fortunately, this is easy to replicate.
To do this, you can make a second call to your LLM (in parallel) asking for related questions. And don’t forget to pass along those citations in the context again.
Or, you can attempt to construct a prompt so that the LLM answers the question AND comes up with related questions. This saves an API call and some tokens, but it’s a bit challenging getting these LLMs to always answer in a consistent and repeatable format.
To make this a complete example, we need a usable UI. I kept the UI as simple as possible and everything is in index.html
. I’m using Bootstrap, jquery, and some basic CSS / javascript, markdown, and a JS syntax highlighter to make this happen.
To improve the experience, the UI does the following:
- The answer streams back to the user (improving perception of speed)
- The citations are replaced by a nicer in-line UI with a clickable popup for the user to learn more
- The sources considered are included after the answer in case the user wants to explore further
- Markdown and code syntax highlighting are used if necessary
To explore a working example, check out https://yaddleai.com. It's mostly the same code though I added a second search call in parallel to fetch images, I wrote a separate page to fetch the latest news, and a few other minor improvements.