ColiVara is a State of the Art Retrieval API - with a delightful developer experience.

The RAG solution you're
looking for

ColiVara is a State of the Art Retrieval API. It stores, searches, and retrieves documents based on their visual embedding.

colivara.com
pip install colivara_py
# Import and initialize ColiVara
from colivara_py import ColiVara

rag_client = ColiVara()

# Upload a document to the default collection
document = rag_client.upsert_document(
  name="sample_document",
  url="https://example.com/sample.pdf",
  metadata={"author": "John Doe"}
)
results = rag_client.search(query="machine learning")
print(results) # top 3 pages with the most relevant information

About ColiVara

Colivara is a state of the art retrieval API that allows you to store, search, and retrieve documents based on their visual embeddings.

Documents are visually rich structures that convey information through text, as well as tables, figures, page layouts, and charts. While legacy document retrieval systems exhibit good performance on query-to-text matching, they struggle to pass visual cues efficiently to large language models, hindering their performance on practical document retrieval applications such as Retrieval Augmented Generation.

It is a web-first implementation of the ColPali paper using ColQwen2 as the LLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents.

Retrieval Augmented Generation (RAG) is a powerful technique that allows us to enhance LLMs (Language Models) output with private documents and proprietary knowledge that is not available elsewhere. For example, a company's internal documents or a researcher's notes. However, it is limited by the quality of the text extraction pipeline. With limited ability to extract visual cues and other non-textual information, RAG can be sub-optimal for documents that are visually rich. ColiVara uses vision models to generate embeddings for documents, allowing you to retrieve documents based on their visual content.

ColiVara Use Cases

Logo 01

Query

Results

Embeddings

LLM

Guardrails

Metadata filtering

User

Answer

Image

Data

Embeddings

LLM

Guardrails

Metadata filtering

Document

Structured data

Query

Results

Document metadata

LLM

Guardrails

Collection metadata

User

Answer

Logo 03
Logo 04
Logo 05
Logo 02
Logo 07
Logo 06
Logo 09
Logo 08

State of the Art Retrieval. Delightful developer experience

Planet

Wide Document Support

Supports over 100 file formats including PDF, DOCX, PPTX, and more.

Modern PgVector Features

We use HalfVecs for faster search and reduced storage requirements.

State of the Art retrieval

The API is based on the ColiPali paper and uses the ColQwen2 model for embeddings. It outperforms existing retrieval systems on both quality and latency.

Filtering

Filtering for collections and documents on arbitrary metadata fields. For example, you can filter documents by author or year. Or filter collections by type.

Webpage Support

Automatically takes a screenshot of webpages and indexes them even if it not a file.

Documents & Collections

A user can have multiple collections. For example, a user can have a collection for research papers and another for books. Allowing for efficient retrieval and organization of documents. Each collection can have multiple documents with unlimited and user-defined metadata.

Plans that match your needs

No matter how many documents you have - our pricing is simple, transparent and adapts to the size of your usage.

$ /month

Create your next project with ColiVara