multimodal_rag / README.md
ej68okap
new code added
9fe4df8

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Multimodal Rag
emoji: 🐨
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false

Multimodal RAG with Colpali, Milvus, and Visual Language Models

This repository demonstrates how to build a Multimodal Retrieval-Augmented Generation (RAG) application using Colpali, Milvus, and Visual Language Models (VLMs) like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.


Features

  • Multimodal Q&A: Combines visual and textual embeddings for robust query answering.
  • PDF as Images: Treats PDF pages as images to preserve layout and visual context.
  • Efficient Retrieval: Utilizes Milvus for fast and accurate vector search.
  • Advanced Query Processing: Integrates Colpali and VLMs for embeddings and response generation.

Architecture Overview

  1. Colpali:

    • Generates embeddings for images (PDF pages) and text (user queries).
    • Processes visual and textual data seamlessly.
  2. Milvus:

    • A vector database used for indexing and retrieving embeddings.
    • Supports HNSW-based indexing for efficient similarity searches.
  3. Visual Language Models:

    • Gemini or GPT-4o performs context-aware Q&A using retrieved pages.

Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU for acceleration
  • Milvus installed and running (Installation Guide)
  • Required Python packages (see requirements.txt)

Steps to Run the Application Locally

  1. Clone the repository
  2. Install dependencies as pip install -r requirements.txt
  3. Set up environment variables Add the following variables to your .env file or environment: GEMINI_API_KEY=
  4. Launch the Gradio App as python app.py

Deploying the Gradio App on Hugging Face Spaces

  1. Prepare the Repository git clone https://github.com/saumitras/colpali-milvus-rag.git cd colpali-milvus-rag

  2. Organize the Repository: Ensure the app file (e.g., app.py) contains the Gradio application code. Include the requirements.txt file for dependencies.

Update the Hugging Face API Configuration:

  1. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets: Navigate to your Hugging Face Space. Go to the Settings tab and add the required secrets under Repository secrets.

  2. Create a New Space Visit Hugging Face Spaces. Click New Space. Fill in the details: Name: Give your Space a unique name (e.g., multimodal_rag). SDK: Select Gradio as the SDK. Visibility: Choose between Public or Private. Click Create Space.

  3. Push Code to Hugging Face Initialize Git and push the code: git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag git push hf main

  4. Wait for the Hugging Face Space to build and deploy the application.

The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag