LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications

LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications

I've been pretty unsatisfied with web search options for local LLM/RAG systems. Most setups either rely on paid APIs like Brave, or meta search scrapers like SearXNG.

So I built LLMSearchIndex- a Python library for fully local internet-scale search. It uses a custom trained, highly compressed search index that contains most of the webpages from FineWeb + Wikipedia. The full index is only ~2GB and runs locally on most hardware with pretty fast retrieval speeds.

I've built a python library to make it easy to retrieve these results for RAG context.

from llmsearchindex import LLMIndex index = LLMIndex() results = index.search("who invented sliced bread?", top_k=5) 

You can also check out a demo here: https://zakerytclarke-llmsearchindex.hf.space/

submitted by /u/zakerytclarke
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top