Spaces:

ultron1996
/

multimodal_rag

Runtime error

App Files Files Community

multimodal_rag / README.md

ej68okap

new code added

9fe4df8 11 months ago

preview code

raw

history blame contribute delete

3.14 kB

	---
	title: Multimodal Rag
	emoji: 🐨
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.12.0
	app_file: app.py
	pinned: false
	---

	# Multimodal RAG with Colpali, Milvus, and Visual Language Models

	This repository demonstrates how to build a Multimodal Retrieval-Augmented Generation (RAG) application using Colpali, Milvus, and Visual Language Models (VLMs) like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document.

	---

	## Features

	- Multimodal Q&A: Combines visual and textual embeddings for robust query answering.
	- PDF as Images: Treats PDF pages as images to preserve layout and visual context.
	- Efficient Retrieval: Utilizes Milvus for fast and accurate vector search.
	- Advanced Query Processing: Integrates Colpali and VLMs for embeddings and response generation.

	---

	## Architecture Overview

	1. Colpali:
	- Generates embeddings for images (PDF pages) and text (user queries).
	- Processes visual and textual data seamlessly.

	2. Milvus:
	- A vector database used for indexing and retrieving embeddings.
	- Supports HNSW-based indexing for efficient similarity searches.

	3. Visual Language Models:
	- Gemini or GPT-4o performs context-aware Q&A using retrieved pages.

	---

	## Installation

	### Prerequisites
	- Python 3.8 or higher
	- CUDA-compatible GPU for acceleration
	- Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md))
	- Required Python packages (see `requirements.txt`)

	### Steps to Run the Application Locally
	1. Clone the repository
	2. Install dependencies as pip install -r requirements.txt
	3. Set up environment variables
	Add the following variables to your .env file or environment:
	GEMINI_API_KEY=<Your_Gemini_API_Key>
	4. Launch the Gradio App as python app.py

	### Deploying the Gradio App on Hugging Face Spaces
	1. Prepare the Repository
	git clone https://github.com/saumitras/colpali-milvus-rag.git
	cd colpali-milvus-rag

	2. Organize the Repository:
	Ensure the app file (e.g., app.py) contains the Gradio application code.
	Include the requirements.txt file for dependencies.

	Update the Hugging Face API Configuration:

	3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets:
	Navigate to your Hugging Face Space.
	Go to the Settings tab and add the required secrets under Repository secrets.

	4. Create a New Space
	Visit Hugging Face Spaces.
	Click New Space.
	Fill in the details:
	Name: Give your Space a unique name (e.g., multimodal_rag).
	SDK: Select Gradio as the SDK.
	Visibility: Choose between Public or Private.
	Click Create Space.
	5. Push Code to Hugging Face
	Initialize Git and push the code:
	git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag
	git push hf main

	6. Wait for the Hugging Face Space to build and deploy the application.

	The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag