Spaces:
Runtime error
Runtime error
ej68okap
commited on
Commit
Β·
9fe4df8
1
Parent(s):
a53d884
new code added
Browse files
README.md
CHANGED
|
@@ -8,3 +8,81 @@ sdk_version: 5.12.0
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
+
|
| 12 |
+
# Multimodal RAG with Colpali, Milvus, and Visual Language Models
|
| 13 |
+
|
| 14 |
+
This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## Features
|
| 19 |
+
|
| 20 |
+
- **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering.
|
| 21 |
+
- **PDF as Images**: Treats PDF pages as images to preserve layout and visual context.
|
| 22 |
+
- **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search.
|
| 23 |
+
- **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation.
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## Architecture Overview
|
| 28 |
+
|
| 29 |
+
1. **Colpali**:
|
| 30 |
+
- Generates embeddings for images (PDF pages) and text (user queries).
|
| 31 |
+
- Processes visual and textual data seamlessly.
|
| 32 |
+
|
| 33 |
+
2. **Milvus**:
|
| 34 |
+
- A vector database used for indexing and retrieving embeddings.
|
| 35 |
+
- Supports HNSW-based indexing for efficient similarity searches.
|
| 36 |
+
|
| 37 |
+
3. **Visual Language Models**:
|
| 38 |
+
- Gemini or GPT-4o performs context-aware Q&A using retrieved pages.
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## Installation
|
| 43 |
+
|
| 44 |
+
### Prerequisites
|
| 45 |
+
- Python 3.8 or higher
|
| 46 |
+
- CUDA-compatible GPU for acceleration
|
| 47 |
+
- Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
|
| 48 |
+
- Required Python packages (see `requirements.txt`)
|
| 49 |
+
|
| 50 |
+
### Steps to Run the Application Locally
|
| 51 |
+
1. Clone the repository
|
| 52 |
+
2. Install dependencies as **pip install -r requirements.txt**
|
| 53 |
+
3. Set up environment variables
|
| 54 |
+
Add the following variables to your .env file or environment:
|
| 55 |
+
GEMINI_API_KEY=<Your_Gemini_API_Key>
|
| 56 |
+
4. Launch the Gradio App as **python app.py**
|
| 57 |
+
|
| 58 |
+
### Deploying the Gradio App on Hugging Face Spaces
|
| 59 |
+
1. Prepare the Repository
|
| 60 |
+
git clone https://github.com/saumitras/colpali-milvus-rag.git
|
| 61 |
+
cd colpali-milvus-rag
|
| 62 |
+
|
| 63 |
+
2. Organize the Repository:
|
| 64 |
+
Ensure the app file (e.g., app.py) contains the Gradio application code.
|
| 65 |
+
Include the requirements.txt file for dependencies.
|
| 66 |
+
|
| 67 |
+
Update the Hugging Face API Configuration:
|
| 68 |
+
|
| 69 |
+
3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
|
| 70 |
+
Navigate to your Hugging Face Space.
|
| 71 |
+
Go to the Settings tab and add the required secrets under Repository secrets.
|
| 72 |
+
|
| 73 |
+
4. Create a New Space
|
| 74 |
+
Visit Hugging Face Spaces.
|
| 75 |
+
Click New Space.
|
| 76 |
+
Fill in the details:
|
| 77 |
+
Name: Give your Space a unique name (e.g., multimodal_rag).
|
| 78 |
+
SDK: Select Gradio as the SDK.
|
| 79 |
+
Visibility: Choose between Public or Private.
|
| 80 |
+
Click Create Space.
|
| 81 |
+
5. Push Code to Hugging Face
|
| 82 |
+
Initialize Git and push the code:
|
| 83 |
+
git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
|
| 84 |
+
git push hf main
|
| 85 |
+
|
| 86 |
+
6. Wait for the Hugging Face Space to build and deploy the application.
|
| 87 |
+
|
| 88 |
+
The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag
|