Retrievers
📄️ Amazon Kendra
Amazon Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.
📄️ Arxiv
arXiv is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
📄️ Azure Cognitive Search
Azure Cognitive Search (formerly known as Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
📄️ BM25
BM25 also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.
📄️ Chaindesk
Chaindesk platform brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).
📄️ ChatGPT Plugin
OpenAI plugins connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.
📄️ Cohere Reranker
Cohere is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.
📄️ DocArray Retriever
DocArray is a versatile, open-source tool for managing your multi-modal data. It lets you shape your data however you want, and offers the flexibility to store and search it using various document index backends. Plus, it gets even better - you can utilize your DocArray document index to create a DocArrayRetriever, and build awesome Langchain apps!
📄️ ElasticSearch BM25
Elasticsearch is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
📄️ Google Cloud Enterprise Search
Enterprise Search is a part of the Generative AI App Builder suite of tools offered by Google Cloud.
📄️ kNN
In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.
📄️ LOTR (Merger Retriever)
Lord of the Retrievers, also known as MergerRetriever, takes a list of retrievers as input and merges the results of their getrelevantdocuments() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.
📄️ Metal
Metal is a managed service for ML Embeddings.
📄️ Pinecone Hybrid Search
Pinecone is a vector database with broad functionality.
📄️ PubMed
This notebook goes over how to use PubMed as a retriever
📄️ SVM
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.
📄️ TF-IDF
TF-IDF means term-frequency times inverse document-frequency.
📄️ Vespa
Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.
📄️ Weaviate Hybrid Search
Weaviate is an open source vector database.
📄️ Wikipedia
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.
📄️ Zep
Retriever Example for Zep - A long-term memory store for LLM applications.