Spaces:
Runtime error
Runtime error
| title: Multimodal Rag | |
| emoji: π¨ | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.12.0 | |
| app_file: app.py | |
| pinned: false | |
| # Multimodal RAG with Colpali, Milvus, and Visual Language Models | |
| This repository demonstrates how to build a **Multimodal Retrieval-Augmented Generation (RAG)** application using **Colpali**, **Milvus**, and **Visual Language Models (VLMs)** like Gemini or GPT-4o. The application allows users to upload a PDF and perform Q&A queries on both textual and visual elements of the document. | |
| --- | |
| ## Features | |
| - **Multimodal Q&A**: Combines visual and textual embeddings for robust query answering. | |
| - **PDF as Images**: Treats PDF pages as images to preserve layout and visual context. | |
| - **Efficient Retrieval**: Utilizes Milvus for fast and accurate vector search. | |
| - **Advanced Query Processing**: Integrates Colpali and VLMs for embeddings and response generation. | |
| --- | |
| ## Architecture Overview | |
| 1. **Colpali**: | |
| - Generates embeddings for images (PDF pages) and text (user queries). | |
| - Processes visual and textual data seamlessly. | |
| 2. **Milvus**: | |
| - A vector database used for indexing and retrieving embeddings. | |
| - Supports HNSW-based indexing for efficient similarity searches. | |
| 3. **Visual Language Models**: | |
| - Gemini or GPT-4o performs context-aware Q&A using retrieved pages. | |
| --- | |
| ## Installation | |
| ### Prerequisites | |
| - Python 3.8 or higher | |
| - CUDA-compatible GPU for acceleration | |
| - Milvus installed and running ([Installation Guide](https://milvus.io/docs/install_standalone.md)) | |
| - Required Python packages (see `requirements.txt`) | |
| ### Steps to Run the Application Locally | |
| 1. Clone the repository | |
| 2. Install dependencies as **pip install -r requirements.txt** | |
| 3. Set up environment variables | |
| Add the following variables to your .env file or environment: | |
| GEMINI_API_KEY=<Your_Gemini_API_Key> | |
| 4. Launch the Gradio App as **python app.py** | |
| ### Deploying the Gradio App on Hugging Face Spaces | |
| 1. Prepare the Repository | |
| git clone https://github.com/saumitras/colpali-milvus-rag.git | |
| cd colpali-milvus-rag | |
| 2. Organize the Repository: | |
| Ensure the app file (e.g., app.py) contains the Gradio application code. | |
| Include the requirements.txt file for dependencies. | |
| Update the Hugging Face API Configuration: | |
| 3. Add necessary environment variables like GEMINI_API_KEY or OPENAI_API_KEY to the Hugging Face Spaces Secrets: | |
| Navigate to your Hugging Face Space. | |
| Go to the Settings tab and add the required secrets under Repository secrets. | |
| 4. Create a New Space | |
| Visit Hugging Face Spaces. | |
| Click New Space. | |
| Fill in the details: | |
| Name: Give your Space a unique name (e.g., multimodal_rag). | |
| SDK: Select Gradio as the SDK. | |
| Visibility: Choose between Public or Private. | |
| Click Create Space. | |
| 5. Push Code to Hugging Face | |
| Initialize Git and push the code: | |
| git remote add hf https://huggingface.co/spaces/ultron1996/multimodal_rag | |
| git push hf main | |
| 6. Wait for the Hugging Face Space to build and deploy the application. | |
| The app has been deployed on Hugging Face Spaces and Demo is running at https://huggingface.co/spaces/ultron1996/multimodal_rag | |