Spaces:

ultron1996
/

multimodal_rag

Runtime error

App Files Files Community

multimodal_rag / README.md

ej68okap

new code added

9fe4df8 11 months ago

preview code

raw

history blame contribute delete

3.14 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Multimodal Rag
emoji: 🐨
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false

Multimodal RAG with Colpali, Milvus, and Visual Language Models

This repository demonstrates how to build a Multimodal Retrieval-Augmented Generation (RAG) application using Colpali, Milvus, and Visual Language Models (VLMs) like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.

Features

Multimodal Q&A: Combines visual and textual embeddings for robust query answering.
PDF as Images: Treats PDF pages as images to preserve layout and visual context.
Efficient Retrieval: Utilizes Milvus for fast and accurate vector search.
Advanced Query Processing: Integrates Colpali and VLMs for embeddings and response generation.

Architecture Overview

Colpali:
- Generates embeddings for images (PDF pages) and text (user queries).
- Processes visual and textual data seamlessly.
Milvus:
- A vector database used for indexing and retrieving embeddings.
- Supports HNSW-based indexing for efficient similarity searches.
Visual Language Models:
- Gemini or GPT-4o performs context-aware Q&A using retrieved pages.

Installation

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU for acceleration
Milvus installed and running (Installation Guide)
Required Python packages (see requirements.txt)

Steps to Run the Application Locally

Clone the repository
Install dependencies as pip install -r requirements.txt
Set up environment variables Add the following variables to your .env file or environment: GEMINI_API_KEY=
Launch the Gradio App as python app.py

Deploying the Gradio App on Hugging Face Spaces

Prepare the Repository git clone https://github.com/saumitras/colpali-milvus-rag.git cd colpali-milvus-rag
Organize the Repository: Ensure the app file (e.g., app.py) contains the Gradio application code. Include the requirements.txt file for dependencies.

Update the Hugging Face API Configuration:

Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets: Navigate to your Hugging Face Space. Go to the Settings tab and add the required secrets under Repository secrets.
Create a New Space Visit Hugging Face Spaces. Click New Space. Fill in the details: Name: Give your Space a unique name (e.g., multimodal_rag). SDK: Select Gradio as the SDK. Visibility: Choose between Public or Private. Click Create Space.
Push Code to Hugging Face Initialize Git and push the code: git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag git push hf main
Wait for the Hugging Face Space to build and deploy the application.

The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag