ColiVara is a State of the Art Retrieval API - with a delightful developer experience.

Make your RAG application
10x Smarter

ColiVara has state of the art retrieval performance on both text and visual documents. Whether it's complex financial reports,technical diagrams, or data-rich tables, our advanced vision models see and understand your documents just like a human would. Say goodbye to broken layouts, missed context, and OCR limitations.

Start Free Trial -> Learn More 🤗 Demo

colivara.com

pip install colivara_py

# Import and initialize ColiVara
from colivara_py import ColiVara

rag_client = ColiVara()

# Upload a document to the default collection
document = rag_client.upsert_document(
  name="sample_document",
  url="https://example.com/sample.pdf",
  metadata={"author": "John Doe"}
)
results = rag_client.search(query="machine learning")
print(results) # top 3 pages with the most relevant information

About ColiVara

Colivara is a state of the art retrieval API that allows you to store, search, and retrieve documents based on their visual embeddings.

Documents are visually rich structures that convey information through text, as well as tables, figures, page layouts, and charts. While legacy document retrieval systems exhibit good performance on query-to-text matching, they struggle to pass visual cues efficiently to large language models, hindering their performance on practical document retrieval applications such as Retrieval Augmented Generation.

It is a web-first implementation of the ColPali paper using ColQwen2 as the LLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents.

Benchmark comparison showing ColiVara's superior performance

Benchmark comparison across different document types and retrieval tasks Learn more in our documentation ->

ColiVara Use Cases

Query

Results

Embeddings

LLM

Guardrails

Metadata filtering

User

Answer

Image

Data

Embeddings

LLM

Guardrails

Metadata filtering

Document

Structured data

Query

Results

Document metadata

LLM

Guardrails

Collection metadata

User

Answer

State of the Art Retrieval. Delightful developer experience

Wide Document Support

Supports over 100 file formats including PDF, DOCX, PPTX, and more.

Modern PgVector Features

We use HalfVecs for faster search and reduced storage requirements.

State of the Art retrieval

The API is based on the ColiPali paper and uses the ColQwen2 model for embeddings. It outperforms existing retrieval systems on both quality and latency.

Filtering

Filtering for collections and documents on arbitrary metadata fields. For example, you can filter documents by author or year. Or filter collections by type.

Webpage Support

Automatically takes a screenshot of webpages and indexes them even if it not a file.

Documents & Collections

A user can have multiple collections. For example, a user can have a collection for research papers and another for books. Allowing for efficient retrieval and organization of documents. Each collection can have multiple documents with unlimited and user-defined metadata.

Plans that match your needs

No matter how many documents you have - our pricing is simple, transparent and adapts to the size of your usage.

$ /month

Create your next project with ColiVara

Start Free Trial ->

Make your RAG application 10x Smarter

Why visual embeddings?

ColiVara Use Cases

State of the Art Retrieval. Delightful developer experience

Wide Document Support

Modern PgVector Features

State of the Art retrieval

Filtering

Webpage Support

Documents & Collections

Plans that match your needs

Create your next project with ColiVara

Make your RAG application
10x Smarter