File size: 3,135 Bytes
241c492
b61a6a6
 
 
 
 
 
 
 
5318bb0
9fe4df8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: Multimodal Rag
emoji: 🐨
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
---

# Multimodal RAG with Colpali, Milvus, and Visual Language Models

This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.

---

## Features

- **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
- **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
- **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
- **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.

---

## Architecture Overview

1. **Colpali**:
   - Generates embeddings for images (PDF pages) and text (user queries).
   - Processes visual and textual data seamlessly.

2. **Milvus**:
   - A vector database used for indexing and retrieving embeddings.
   - Supports HNSW-based indexing for efficient similarity searches.

3. **Visual Language Models**:
   - Gemini or GPT-4o performs context-aware Q&A using retrieved pages.

---

## Installation

### Prerequisites
- Python 3.8 or higher
- CUDA-compatible GPU for acceleration
- Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
- Required Python packages (see `requirements.txt`)

### Steps to Run the Application Locally
1. Clone the repository
2. Install dependencies as **pip install -r requirements.txt**
3. Set up environment variables
    Add the following variables to your .env file or environment:
    GEMINI_API_KEY=<Your_Gemini_API_Key>
4.  Launch the Gradio App as **python app.py**

### Deploying the Gradio App on Hugging Face Spaces
1. Prepare the Repository
git clone https://github.com/saumitras/colpali-milvus-rag.git
cd colpali-milvus-rag

2. Organize the Repository:
Ensure the app file (e.g., app.py) contains the Gradio application code.
Include the requirements.txt file for dependencies.

Update the Hugging Face API Configuration:

3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
Navigate to your Hugging Face Space.
Go to the Settings tab and add the required secrets under Repository secrets.

4. Create a New Space
    Visit Hugging Face Spaces.
    Click New Space.
    Fill in the details:
    Name: Give your Space a unique name (e.g., multimodal_rag).
    SDK: Select Gradio as the SDK.
    Visibility: Choose between Public or Private.
    Click Create Space.
5. Push Code to Hugging Face
    Initialize Git and push the code:
    git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
    git push hf main

6. Wait for the Hugging Face Space to build and deploy the application.

The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag