Spaces:

AldawsariNLP
/

Saudi-Law-AI-Assistant

Sleeping

App Files Files Community

AldawsariNLP commited on 21 days ago

Commit

3d910e2

1 Parent(s): 526c6c2

Remove chunks from API response and frontend display, and updating most of the files for final

Browse files

Files changed (14) hide show

.gitignore +2 -0
QUICKSTART.md +203 -21
README.md +42 -134
README_HF_SPACES.md +0 -148
backend/chat_history.py +74 -4
backend/document_processor.py +119 -15
backend/embeddings.py +169 -1
backend/main.py +37 -19
backend/rag_system.py +537 -69
documents/شرح نظام الأحوال الشخصية.pdf +0 -3
files_upload.py +0 -42
frontend/src/App.js +59 -9
processed_documents.json +0 -0
test_nebius_embeddings.py +292 -0

.gitignore CHANGED Viewed

@@ -49,3 +49,5 @@ yarn-error.log*
 Thumbs.db
 # Documents (tracked via Hugging Face Xet)

 Thumbs.db
 # Documents (tracked via Hugging Face Xet)
+GITHUB_SETUP.md
+QUICKSTART.md

QUICKSTART.md CHANGED Viewed

@@ -1,13 +1,19 @@
 # Quick Start Guide
 ## Prerequisites
 - Python 3.10 or 3.11 (required for faiss-cpu compatibility)
 - uv (fast Python package manager) - [Install uv](https://github.com/astral-sh/uv)
 - Node.js 16+ and npm
 - OpenAI API key
-## 5-Minute Setup
 ### 1. Install uv (if not already installed)
@@ -24,7 +30,7 @@ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | ie
 ### 2. Install Node.js (REQUIRED - if not already installed)
 **Check if Node.js is installed:**
-```powershell
 node --version
 npm --version
 ```
@@ -41,19 +47,17 @@ npm --version
    - Complete the installation
 3. **CRITICAL: Restart Your Terminal**:
-   - **Close PowerShell completely**
-   - **Open a new PowerShell window**
    - This is required for PATH changes to take effect
 4. **Verify Installation**:
-   ```powershell
    node --version
    npm --version
    ```
    Both should show version numbers.
-**For detailed Windows installation instructions, see: [INSTALL_NODEJS_WINDOWS.md](INSTALL_NODEJS_WINDOWS.md)**
 ### 3. Install Dependencies
 **Backend (using uv):**
@@ -75,11 +79,9 @@ Create `.env` in the project root:
 OPENAI_API_KEY=sk-your-actual-api-key-here
 ```
-### 5. Add Documents / Processed Data
-- **Local development:** copy your PDF/TXT/DOC/DOCX files into the `documents/` folder before running `uv run python backend/main.py`.
-- **Deploying to Hugging Face Spaces:** large PDFs should be uploaded via the Space UI (Files & versions → Upload). Git pushes can’t include big binaries.
-- If you have a pre-generated `processed_documents.json`, keep it in the project root (it’s copied by the Dockerfile). The backend logs will print whether this file and the `documents/` folder exist at startup.
 ### 6. Run the Application
@@ -92,38 +94,184 @@ uv run python backend/main.py
 # macOS/Linux: source .venv/bin/activate && python backend/main.py
 # Windows: .venv\Scripts\activate && python backend\main.py
 ```
 **Terminal 2 - Frontend:**
 ```bash
 cd frontend
 npm start
 ```
 ### 7. Use the Application
 1. Open http://localhost:3000 in your browser
-2. Click "Index Documents" to index files in the `documents/` folder
 3. Ask questions about your documents!
-## Example Questions
 - "What are the key provisions in the contract?"
 - "What does the law say about [topic]?"
 - "Summarize the main points of the document"
 ## Troubleshooting
 **"OpenAI API key is required"**
 - Make sure you created `.env` in the project root with your API key
 **"No documents found"**
 - Check that files are in the `documents/` folder
 - Supported formats: PDF, TXT, DOCX, DOC
-- On Hugging Face Spaces, make sure you uploaded the PDFs (or a `processed_documents.json`) via the **Files and versions** tab. Watch the build/startup logs for messages such as `[RAG Init] processed_documents.json exists? True`.
-**"RAG system not initialized" (on Spaces)**
-- Ensure `processed_documents.json` is present in the repo **and** not excluded by `.dockerignore`.
-- Upload your source PDFs (or processed data) in the Space UI, then restart the Space so the startup hook can detect them.
 **Frontend can't connect to backend**
 - Ensure backend is running on port 8000
@@ -135,8 +283,42 @@ npm start
 - Restart your terminal after installation
 - Verify installation: `node --version` and `npm --version`
-## Next Steps
-- See [README.md](README.md) for full documentation
-- See [README_HF_SPACES.md](README_HF_SPACES.md) for deployment instructions

 # Quick Start Guide
+Complete guide for local development and deployment to Hugging Face Spaces.
 ## Prerequisites
 - Python 3.10 or 3.11 (required for faiss-cpu compatibility)
 - uv (fast Python package manager) - [Install uv](https://github.com/astral-sh/uv)
 - Node.js 16+ and npm
 - OpenAI API key
+- Git installed (for deployment)
+- Hugging Face account (for deployment) - [Sign up](https://huggingface.co)
+---
+## Part 1: Local Development
 ### 1. Install uv (if not already installed)
 ### 2. Install Node.js (REQUIRED - if not already installed)
 **Check if Node.js is installed:**
+```bash
 node --version
 npm --version
 ```
    - Complete the installation
 3. **CRITICAL: Restart Your Terminal**:
+   - **Close your terminal completely**
+   - **Open a new terminal window**
    - This is required for PATH changes to take effect
 4. **Verify Installation**:
+   ```bash
    node --version
    npm --version
    ```
    Both should show version numbers.
 ### 3. Install Dependencies
 **Backend (using uv):**
 OPENAI_API_KEY=sk-your-actual-api-key-here
 ```
+### 5. Add Documents
+Copy your PDF/TXT/DOC/DOCX files into the `documents/` folder. The application will automatically process them when you start the backend.
 ### 6. Run the Application
 # macOS/Linux: source .venv/bin/activate && python backend/main.py
 # Windows: .venv\Scripts\activate && python backend\main.py
 ```
+The API will run on `http://localhost:8000`
 **Terminal 2 - Frontend:**
 ```bash
 cd frontend
 npm start
 ```
+The app will open at `http://localhost:3000`
 ### 7. Use the Application
 1. Open http://localhost:3000 in your browser
+2. The system will automatically detect and process documents from the `documents/` folder
 3. Ask questions about your documents!
+### Example Questions
 - "What are the key provisions in the contract?"
 - "What does the law say about [topic]?"
 - "Summarize the main points of the document"
+---
+## Part 2: Deployment to Hugging Face Spaces
+### 1. Create a New Space
+1. Go to https://huggingface.co/spaces
+2. Click "Create new Space"
+3. Fill in the details:
+   - **Space name**: `saudi-law-ai-assistant` (or your preferred name)
+   - **SDK**: Select **Docker**
+   - **Visibility**: Public or Private
+4. Click "Create Space"
+### 2. Prepare Your Code
+1. **Build the React frontend**:
+   ```bash
+   cd frontend
+   npm install
+   npm run build
+   cd ..
+   ```
+2. **Ensure all files are ready**:
+   - `app.py` - Main entry point
+   - `pyproject.toml` and `uv.lock` - Python dependencies
+   - `Dockerfile` - Docker configuration
+   - `backend/` - Backend code
+   - `frontend/build/` - Built React app (always run `npm run build` before pushing)
+   - `processed_documents.json` - Optional bundled data so the Space can answer immediately (make sure it is **not** ignored in `.dockerignore`)
+   - `vectorstore/` - Optional pre-built vectorstore folder (if it exists in your repo, it will be included in the Docker image)
+   - `documents/` — PDF sources that power preview/download. Because Hugging Face blocks large binaries in standard git pushes, you have two options:
+     - Use [HF Xet storage](https://huggingface.co/docs/hub/xet/using-xet-storage#git) for the `documents/` folder so it can live in the repo.
+     - Or keep the folder locally, and after every push upload the PDFs through the Space UI (**Files and versions → Upload files**) into `documents/`.
+### 3. Set Up Environment Variables
+1. In your Hugging Face Space, go to **Settings**
+2. Scroll to **Repository secrets**
+3. Add secrets:
+   - **Name**: `OPENAI_API_KEY`
+   - **Value**: Your OpenAI API key
+   - (Optional) **Name**: `HF_TOKEN` (if you need to upload files programmatically)
+### 4. Set Up Xet Storage (Recommended for PDFs)
+If you want to store PDFs in the repository:
+1. **Enable Xet storage** on your Space:
+   - Go to Space Settings → Large file storage
+   - Enable "Hugging Face Xet" (or request access at https://huggingface.co/join/xet)
+2. **Install git-xet locally**:
+   ```bash
+   # macOS/Linux
+   curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
+   # Or via Homebrew
+   brew tap huggingface/tap
+   brew install git-xet
+   git xet install
+   ```
+3. **Configure git to use Xet**:
+   ```bash
+   git lfs install
+   git lfs track "documents/*.pdf"
+   git add .gitattributes documents/*.pdf
+   git commit -m "Track PDFs with Xet"
+   ```
+### 5. Push to Hugging Face
+1. **Initialize git** (if not already done):
+   ```bash
+   git init
+   ```
+2. **Add Hugging Face remote**:
+   ```bash
+   git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+   ```
+   Replace `YOUR_USERNAME` and `YOUR_SPACE_NAME` with your actual values.
+3. **Add and commit files**:
+   ```bash
+   git add .
+   git commit -m "Initial deployment"
+   ```
+4. **Push to Hugging Face**:
+   ```bash
+   git push hf main
+   ```
+### 6. Wait for Build
+- Hugging Face will automatically build your Docker image
+- This may take 5-10 minutes
+- You can monitor the build logs in the Space's "Logs" tab
+### 7. Access Your Application
+Once the build completes, your application will be available at:
+```
+https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space
+```
+### 8. Upload Documents / Processed Data (if not using Xet)
+If you didn't use Xet storage for PDFs:
+- After the Space builds, open the **Files and versions** tab and click **Upload files** to add your `documents/*.pdf`
+- If you have a prebuilt `processed_documents.json`, upload it as well so the backend can build the vectorstore immediately
+- The startup logs will print whether `processed_documents.json` and `documents/` were detected inside the container
+### 9. Redeploy Checklist
+When updating your Space:
+1. `cd frontend && npm install && npm run build && cd ..`
+2. `git add .`
+3. `git commit -m "Update application"`
+4. `git push hf main` (or `git push hf main --force` if needed)
+5. Watch the Space build logs and confirm the new startup logs show the presence of `processed_documents.json`/`documents`
+---
+## Important Notes
+1. **API Endpoints**: The frontend is configured to use `/api` prefix for backend calls. This is handled by the `app.py` file.
+2. **Documents Folder**: The `documents/` folder is automatically created if it doesn't exist. To bundle PDFs, either:
+   - Enable HF Xet storage for `documents/` (recommended)
+   - Or upload the files via the Space UI after each push
+3. **Processed Data**: `processed_documents.json` can be bundled with the repo. The backend tries to bootstrap from this file at startup, so make sure it reflects the content you expect the Space to serve.
+4. **Vectorstore**: The `vectorstore/` folder is included in the Docker image if it exists in your repo. If it doesn't exist, it will be created at runtime from `processed_documents.json`.
+5. **Port**: Hugging Face Spaces uses port 7860 by default, which is configured in `app.py`.
+6. **Dependencies**: This project uses `uv` for Python package management. Dependencies are defined in `pyproject.toml` and `uv.lock`.
+---
 ## Troubleshooting
+### Local Development
 **"OpenAI API key is required"**
 - Make sure you created `.env` in the project root with your API key
 **"No documents found"**
 - Check that files are in the `documents/` folder
 - Supported formats: PDF, TXT, DOCX, DOC
 **Frontend can't connect to backend**
 - Ensure backend is running on port 8000
 - Restart your terminal after installation
 - Verify installation: `node --version` and `npm --version`
+### Hugging Face Spaces Deployment
+**Build Fails**
+- Check the build logs in the Space's "Logs" tab
+- Ensure all dependencies are in `pyproject.toml`
+- Verify the Dockerfile is correct
+- Make sure `frontend/build/` exists (run `npm run build`)
+**"RAG system not initialized" (on Spaces)**
+- Ensure `processed_documents.json` is present in the repo **and** not excluded by `.dockerignore`
+- Upload your source PDFs (or processed data) in the Space UI, then restart the Space
+- Check startup logs for initialization messages
+**API Errors**
+- Check that `OPENAI_API_KEY` is set correctly in Space secrets
+- Verify the API key is valid and has credits
+- Check the Space logs for detailed error messages
+**Frontend Not Loading**
+- Ensure `npm run build` was run successfully before pushing
+- Check that `frontend/build/` directory exists and contains `index.html`
+- Verify the build completed without errors
+**Document Preview Not Working**
+- Ensure PDFs are uploaded to the `documents/` folder in the Space
+- Check that filenames match exactly (including encoding)
+- Verify documents are accessible via the Space's file browser
+**Push Rejected - Binary Files**
+- Enable Xet storage for your Space (see Step 4 above)
+- Or exclude PDFs from git and upload via Space UI
+---
+## Next Steps
+- See [README.md](README.md) for full documentation and API details
+- Check the Space logs for detailed startup and error information
+- Monitor your OpenAI API usage to avoid unexpected charges

README.md CHANGED Viewed

@@ -35,141 +35,54 @@ A web application that allows users to ask questions about indexed legal documen
 ## Project Structure
 ```
-law_project1/
 ├── backend/
 │   ├── main.py              # FastAPI application
 │   ├── rag_system.py        # RAG implementation
-│   ├── requirements.txt     # Python dependencies
-│   └── .env.example         # Environment variables template
 ├── frontend/
 │   ├── src/
 │   │   ├── App.js           # Main React component
 │   │   ├── App.css          # Styles
 │   │   ├── index.js         # React entry point
 │   │   └── index.css        # Global styles
 │   ├── public/
 │   │   └── index.html       # HTML template
 │   └── package.json         # Node dependencies
-├── documents/               # Place your documents here
 ├── app.py                   # Hugging Face Spaces entry point
 ├── Dockerfile               # Docker configuration
-├── requirements.txt         # Main Python dependencies
-└── README.md               # This file
 ```
-## Setup Instructions
-### Local Development
-1. **Navigate to the project**:
-   ```bash
-   cd law_project1
-   ```
-2. **Set up the backend with uv**:
-   ```bash
-   # Install uv if you haven't already
-   # On macOS/Linux: curl -LsSf https://astral.sh/uv/install.sh | sh
-   # On Windows: powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-   # Install dependencies
-   uv sync
-   ```
-3. **Set up environment variables**:
-   ```bash
-   # Create .env file in project root
-   echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
-   ```
-   Or manually create `.env` file in the project root with:
-   ```
-   OPENAI_API_KEY=your_openai_api_key_here
-   ```
-4. **Install Node.js** (REQUIRED - if not already installed):
-   - **Windows**: Download from https://nodejs.org/ (click the green "LTS" button)
-   - Run the installer (make sure "Add to PATH" is checked)
-   - **CRITICAL**: Close and restart your terminal/PowerShell after installation
-   - Verify: `node --version` and `npm --version`
-   - **For detailed Windows instructions, see: [INSTALL_NODEJS_WINDOWS.md](INSTALL_NODEJS_WINDOWS.md)**
-5. **Set up the frontend**:
-   ```bash
-   cd frontend
-   npm install
-   cd ..
-   ```
-6. **Add documents**:
-   - Create a `documents` folder in the project root (if it doesn't exist)
-   - Add your PDF, TXT, DOCX, or DOC files
-7. **Run the backend**:
-   ```bash
-   # Using uv run (recommended)
-   uv run python backend/main.py
-   # Or activate the virtual environment first
-   # On macOS/Linux:
-   source .venv/bin/activate
-   python backend/main.py
-   # On Windows:
-   # .venv\Scripts\activate
-   # python backend\main.py
-   ```
-   The API will run on `http://localhost:8000`
-8. **Run the frontend** (in a new terminal):
-   ```bash
-   cd frontend
-   npm start
-   ```
-   The app will open at `http://localhost:3000`
-### Usage
-1. **Index Documents**:
-   - Click the "Index Documents" button in the UI, or
-   - Make a POST request to `http://localhost:8000/index` with:
-     ```json
-     {
-       "folder_path": "documents"
-     }
-     ```
-2. **Ask Questions**:
-   - Type your question in the chat input
-   - The system will retrieve relevant context and return exact text from the documents
-## Hugging Face Spaces Deployment
-See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructions.
-### Quick Deployment Steps
-1. **Build the frontend**:
-   ```bash
-   cd frontend
-   npm install
-   npm run build
-   cd ..
-   ```
-2. **Create a Hugging Face Space** (Docker SDK)
-3. **Set environment variable**:
-   - In Space Settings → Repository secrets
-   - Add `OPENAI_API_KEY` with your API key
-4. **Push to Hugging Face**:
-   ```bash
-   git init
-   git add .
-   git commit -m "Initial commit"
-   git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
-   git push -u origin main
-   ```
 ## API Endpoints
@@ -197,28 +110,23 @@ See [README_HF_SPACES.md](README_HF_SPACES.md) for detailed deployment instructi
 - The system extracts exact text from documents, not generated responses
 - Supported document formats: PDF, TXT, DOCX, DOC
 - The vectorstore is saved locally and persists between sessions
-- Make sure to index documents before asking questions
 - For Hugging Face Spaces, the frontend automatically uses `/api` as the API URL
-- This project uses `uv` for Python package management - dependencies are defined in `pyproject.toml`
 - The `.env` file should be in the project root (not in the backend folder)
 ## Troubleshooting
-### Backend Issues
-- **OpenAI API Key Error**: Make sure `OPENAI_API_KEY` is set in your environment or `.env` file
-- **No documents found**: Ensure documents are in the `documents/` folder with supported extensions
-### Frontend Issues
-- **API Connection Error**: Check that the backend is running on port 8000
-- **CORS Errors**: The backend has CORS enabled for all origins in development
-### Deployment Issues
-- **Build fails**: Ensure all dependencies are in `requirements.txt`
-- **Frontend not loading**: Make sure `npm run build` was run successfully
-- **API not working**: Verify `OPENAI_API_KEY` is set in Hugging Face Space secrets
 ## License

 ## Project Structure
 ```
+KSAlaw-document-agent/
 ├── backend/
 │   ├── main.py              # FastAPI application
 │   ├── rag_system.py        # RAG implementation
+│   ├── document_processor.py # Document processing logic
+│   ├── embeddings.py        # OpenAI embeddings wrapper
+│   └── chat_history.py     # Chat history management
 ├── frontend/
 │   ├── src/
 │   │   ├── App.js           # Main React component
 │   │   ├── App.css          # Styles
 │   │   ├── index.js         # React entry point
 │   │   └── index.css        # Global styles
+│   ├── build/               # Built React app (for deployment)
 │   ├── public/
 │   │   └── index.html       # HTML template
 │   └── package.json         # Node dependencies
+├── documents/               # Place your PDF documents here
+├── vectorstore/            # FAISS vectorstore (auto-generated)
 ├── app.py                   # Hugging Face Spaces entry point
 ├── Dockerfile               # Docker configuration
+├── pyproject.toml           # Python dependencies (uv)
+├── uv.lock                  # Locked dependencies
+├── processed_documents.json # Processed document summaries
+├── QUICKSTART.md            # Complete setup and deployment guide
+└── README.md                # This file
 ```
+## Quick Start
+For complete setup and deployment instructions, see **[QUICKSTART.md](QUICKSTART.md)**.
+### Quick Overview
+**Local Development:**
+1. Install dependencies: `uv sync` and `cd frontend && npm install`
+2. Create `.env` with your `OPENAI_API_KEY`
+3. Add documents to `documents/` folder
+4. Run backend: `uv run python backend/main.py`
+5. Run frontend: `cd frontend && npm start`
+**Deployment to Hugging Face Spaces:**
+1. Build frontend: `cd frontend && npm run build`
+2. Set up Xet storage (recommended) or prepare to upload PDFs via UI
+3. Push to Hugging Face: `git push hf main`
+4. Set `OPENAI_API_KEY` in Space secrets
+See [QUICKSTART.md](QUICKSTART.md) for detailed step-by-step instructions for both local development and deployment.
 ## API Endpoints
 - The system extracts exact text from documents, not generated responses
 - Supported document formats: PDF, TXT, DOCX, DOC
 - The vectorstore is saved locally and persists between sessions
+- Documents are automatically processed on startup (no manual indexing needed)
 - For Hugging Face Spaces, the frontend automatically uses `/api` as the API URL
+- This project uses `uv` for Python package management - dependencies are defined in `pyproject.toml` and `uv.lock`
 - The `.env` file should be in the project root (not in the backend folder)
+- PDFs can be stored using Hugging Face Xet storage or uploaded via the Space UI
 ## Troubleshooting
+For detailed troubleshooting, see the [Troubleshooting section in QUICKSTART.md](QUICKSTART.md#troubleshooting).
+### Common Issues
+- **OpenAI API Key Error**: Make sure `OPENAI_API_KEY` is set in your `.env` file (local) or Space secrets (deployment)
+- **No documents found**: Ensure documents are in the `documents/` folder with supported extensions (PDF, TXT, DOCX, DOC)
+- **Frontend can't connect**: Check that the backend is running on port 8000
+- **Build fails on Spaces**: Ensure `frontend/build/` exists (run `npm run build`), check Dockerfile, verify dependencies in `pyproject.toml`
+- **RAG system not initialized**: Check Space logs, ensure `processed_documents.json` exists and is not ignored by `.dockerignore`
 ## License

README_HF_SPACES.md DELETED Viewed

@@ -1,148 +0,0 @@
-# Deploying to Hugging Face Spaces
-This guide will help you deploy the Law Document RAG application to Hugging Face Spaces.
-## Prerequisites
-1. A Hugging Face account (sign up at https://huggingface.co)
-2. An OpenAI API key
-3. Git installed on your machine
-## Step-by-Step Deployment
-### 1. Create a New Space
-1. Go to https://huggingface.co/spaces
-2. Click "Create new Space"
-3. Fill in the details:
-   - **Space name**: `law-document-rag` (or your preferred name)
-   - **SDK**: Select **Docker**
-   - **Visibility**: Public or Private
-4. Click "Create Space"
-### 2. Prepare Your Code
-1. **Build the React frontend**:
-   ```bash
-   cd frontend
-   npm install
-   npm run build
-   cd ..
-   ```
-2. **Ensure all files are ready**:
-   - `app.py` - Main entry point
-   - `requirements.txt` - Python dependencies
-   - `Dockerfile` - Docker configuration
-   - `backend/` - Backend code
-   - `frontend/build/` - Built React app (always run `npm run build` before pushing)
-   - `processed_documents.json` - Optional bundled data so the Space can answer immediately (make sure it is **not** ignored in `.dockerignore`; the backend now initializes at import time and expects this file if no PDFs are present)
-   - `vectorstore/` - Optional pre-built vectorstore folder (if it exists in your repo, it will be included in the Docker image; otherwise it will be created at runtime from `processed_documents.json`. To ensure the folder exists even if empty, create it with: `mkdir -p vectorstore && touch vectorstore/.gitkeep`)
-   - `documents/` — PDF sources that power preview/download. Because Hugging Face blocks large binaries in standard git pushes, you have two options:
-     - Use [HF Xet storage](https://huggingface.co/docs/hub/xet/using-xet-storage#git) for the `documents/` folder so it can live in the repo.
-     - Or keep the folder locally, and after every push upload the PDFs through the Space UI (**Files and versions → Upload files**) into `documents/`.
-     The Dockerfile now copies `documents/` into the image when present, and still creates the folder if it’s empty.
-### 3. Set Up Environment Variables
-1. In your Hugging Face Space, go to **Settings**
-2. Scroll to **Repository secrets**
-3. Add a new secret:
-   - **Name**: `OPENAI_API_KEY`
-   - **Value**: Your OpenAI API key
-### 4. Push to Hugging Face
-1. **Initialize git** (if not already done):
-   ```bash
-   git init
-   ```
-2. **Add Hugging Face remote**:
-   ```bash
-   git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
-   ```
-   Replace `YOUR_USERNAME` and `YOUR_SPACE_NAME` with your actual values.
-3. **Add and commit files**:
-   ```bash
-   git add .
-   git commit -m "Initial deployment"
-   ```
-4. **Push to Hugging Face**:
-   ```bash
-   git push origin main
-   ```
-### 5. Wait for Build
-- Hugging Face will automatically build your Docker image
-- This may take 5-10 minutes
-- You can monitor the build logs in the Space's "Logs" tab
-### 6. Access Your Application
-Once the build completes, your application will be available at:
-```
-https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space
-```
-### 7. Upload Documents / Processed Data
-- Hugging Face blocks large binary files in git pushes. After the Space builds, open the **Files and versions** tab and click **Upload files** to add your `documents/*.pdf`. They will be available under `/data/Spaces/<space-name>/`.
-- If you have a prebuilt `processed_documents.json`, upload it as well so the backend can build the vectorstore immediately. The startup logs now print whether `processed_documents.json` and `documents/` were detected inside the container.
-### 8. Redeploy Checklist
-1. `cd frontend && npm install && npm run build && cd ..`
-2. `git add .`
-3. `git commit -m "Prepare deployment"`
-4. `git push hf main --force` (authenticate with your HF access token)
-5. Watch the Space build logs and confirm the new startup logs show the presence of `processed_documents.json`/`documents`.
-## Important Notes
-1. **API Endpoints**: The frontend is configured to use `/api` prefix for backend calls. This is handled by the `app.py` file.
-2. **Documents Folder**: The `documents/` folder is automatically created if it doesn't exist. To bundle PDFs, either enable HF Xet storage for `documents/` or upload the files via the Space UI after each push (standard git pushes reject large binaries).
-3. **Processed Data**: `processed_documents.json` can be bundled with the repo. Because the backend now tries to bootstrap from this file at import/startup, make sure it reflects the same content you expect the Space to serve (and keep it under version control if you rely on it).
-4. **Vectorstore**: The `vectorstore/` folder is now included in the Docker image if it exists in your repo. If you have a pre-built vectorstore, include it in your repository and it will be copied to the Docker image. If the vectorstore folder doesn't exist in your repo, ensure an empty folder exists (create with `mkdir -p vectorstore && touch vectorstore/.gitkeep`) or the Docker build may fail. The vectorstore will be created at runtime from `processed_documents.json` if not pre-built.
-5. **Port**: Hugging Face Spaces uses port 7860 by default, which is configured in `app.py`.
-## Troubleshooting
-### Build Fails
-- Check the build logs in the Space's "Logs" tab
-- Ensure all dependencies are in `requirements.txt`
-- Verify the Dockerfile is correct
-### API Errors
-- Check that `OPENAI_API_KEY` is set correctly in Space secrets
-- Verify the API key is valid and has credits
-### Frontend Not Loading
-- Ensure `npm run build` was run successfully
-- Check that `frontend/build/` directory exists and contains `index.html`
-## Updating Your Space
-To update your deployed application:
-```bash
-git add .
-git commit -m "Update description"
-git push origin main
-```
-Hugging Face will automatically rebuild and redeploy.

backend/chat_history.py CHANGED Viewed

@@ -11,13 +11,30 @@ class ChatHistory:
 		self.max_history = max_history
 		self.history: List[Dict[str, str]] = []
-	def add_message(self, role: str, content: str):
-		"""Add a message to chat history"""
-		self.history.append({
 			"role": role,
 			"content": content,
 			"timestamp": datetime.now().isoformat()
-		})
 		# Keep only last N messages
 		if len(self.history) > self.max_history * 2:  # *2 because we have user + assistant pairs
@@ -45,6 +62,59 @@ class ChatHistory:
 		# Format for OpenAI API (remove timestamp)
 		return [{"role": msg["role"], "content": msg["content"]} for msg in last_two]
 	def clear(self):
 		"""Clear chat history"""
 		self.history = []

 		self.max_history = max_history
 		self.history: List[Dict[str, str]] = []
+	def add_message(self, role: str, content: str, source_document: Optional[str] = None, chunks: Optional[List[str]] = None):
+		"""Add a message to chat history
+		Args:
+			role: Message role ("user" or "assistant")
+			content: Message content
+			source_document: Optional document filename used for assistant messages
+			chunks: Optional list of chunk texts used for assistant messages (in chunk mode)
+		"""
+		message = {
 			"role": role,
 			"content": content,
 			"timestamp": datetime.now().isoformat()
+		}
+		# Store document source for assistant messages
+		if role == "assistant" and source_document:
+			message["source_document"] = source_document
+		# Store chunks for assistant messages
+		if role == "assistant" and chunks:
+			message["chunks"] = chunks
+		self.history.append(message)
 		# Keep only last N messages
 		if len(self.history) > self.max_history * 2:  # *2 because we have user + assistant pairs
 		# Format for OpenAI API (remove timestamp)
 		return [{"role": msg["role"], "content": msg["content"]} for msg in last_two]
+	def get_last_document(self) -> Optional[str]:
+		"""Get the document filename used in the last assistant response
+		Returns:
+			Document filename if last message was assistant with a document, None otherwise
+		"""
+		if not self.history:
+			return None
+		# Check last message
+		last_msg = self.history[-1]
+		if last_msg.get("role") == "assistant":
+			return last_msg.get("source_document")
+		return None
+	def get_last_turn_with_document(self) -> Optional[List[Dict[str, str]]]:
+		"""Get the last chat turn that used a document (skipping general questions)
+		Returns:
+			List of messages from the last turn with a document, or None if no such turn exists
+		"""
+		# Search backwards through history to find last assistant message with a document
+		for i in range(len(self.history) - 1, 0, -1):
+			msg = self.history[i]
+			if msg.get("role") == "assistant" and msg.get("source_document"):
+				# Found an assistant message with a document
+				# Get the turn (user + assistant pair)
+				if i >= 1 and self.history[i-1].get("role") == "user":
+					# Format for OpenAI API (remove timestamp, keep source_document in metadata)
+					return [
+						{"role": self.history[i-1]["role"], "content": self.history[i-1]["content"]},
+						{"role": msg["role"], "content": msg["content"]}
+					]
+		return None
+	def get_last_chunks(self) -> Optional[List[str]]:
+		"""Get the chunks used in the last assistant response
+		Returns:
+			List of chunk texts if last message was assistant with chunks, None otherwise
+		"""
+		if not self.history:
+			return None
+		# Check last message
+		last_msg = self.history[-1]
+		if last_msg.get("role") == "assistant":
+			return last_msg.get("chunks")
+		return None
 	def clear(self):
 		"""Clear chat history"""
 		self.history = []

backend/document_processor.py CHANGED Viewed

@@ -1,5 +1,6 @@
 import os
 import json
 from pathlib import Path
 from typing import Dict, List, Optional
 from openai import OpenAI
@@ -21,10 +22,27 @@ class DocumentProcessor:
 			raise ValueError("OpenAI API key is required")
 		os.environ.setdefault("OPENAI_API_KEY", api_key)
-		http_client = NoProxyHTTPClient(timeout=300.0)
 		self.client = OpenAI(http_client=http_client)
 		self.model = model
 	def process_pdf_with_llm(self, pdf_path: str) -> Dict[str, str]:
 		"""
@@ -48,13 +66,74 @@ class DocumentProcessor:
 					purpose="user_data"
 				)
-			prompt = (
-				"You are processing an Arabic legal document. "
-				"Extract ONLY the main content text (remove headers, footers, page numbers, duplicate elements). "
-				"Clean the text to remove formatting artifacts. "
-				"Generate a concise summary in Arabic covering all important content. "
-				'\nReturn ONLY valid JSON with exactly these fields: {"text": "...", "summary": "..."}'
-			)
 			# Use SDK responses API
 			response = self.client.responses.create(
@@ -116,12 +195,22 @@ class DocumentProcessor:
 		# Load existing processed documents
 		existing_docs = []
-		existing_filenames = set()
 		if skip_existing:
 			existing_docs = self.load_from_json()
-			existing_filenames = {doc.get("filename") for doc in existing_docs if doc.get("filename")}
 			if existing_filenames:
 				print(f"Found {len(existing_filenames)} already processed documents")
 		pdf_files = list(folder.glob("*.pdf"))
 		new_processed_docs = []
@@ -129,13 +218,23 @@ class DocumentProcessor:
 		for pdf_file in pdf_files:
 			filename = pdf_file.name
-			# Skip if already processed
-			if skip_existing and filename in existing_filenames:
 				print(f"⊘ Skipped (already processed): {filename}")
 				skipped_count += 1
 				continue
 			# Process new document
 			try:
 				result = self.process_pdf_with_llm(str(pdf_file))
@@ -171,11 +270,16 @@ class DocumentProcessor:
 		if append and json_path.exists():
 			# Load existing and merge, avoiding duplicates
 			existing_docs = self.load_from_json(json_path)
-			existing_filenames = {doc.get("filename") for doc in existing_docs}
-			# Add only new documents
 			for doc in processed_docs:
-				if doc.get("filename") not in existing_filenames:
 					existing_docs.append(doc)
 			processed_docs = existing_docs

 import os
 import json
+import unicodedata
 from pathlib import Path
 from typing import Dict, List, Optional
 from openai import OpenAI
 			raise ValueError("OpenAI API key is required")
 		os.environ.setdefault("OPENAI_API_KEY", api_key)
+		http_client = NoProxyHTTPClient(timeout=900.0)
 		self.client = OpenAI(http_client=http_client)
 		self.model = model
+	@staticmethod
+	def _normalize_filename(filename: str) -> str:
+		"""
+		Normalize filename for comparison (handle Unicode encoding variations).
+		Args:
+			filename: Original filename
+		Returns: Normalized filename (NFC form, lowercased, stripped)
+		"""
+		if not filename:
+			return ""
+		# Normalize to NFC (composed form) to handle encoding variations
+		normalized = unicodedata.normalize("NFC", filename)
+		# Lowercase and strip for case-insensitive comparison
+		return normalized.lower().strip()
 	def process_pdf_with_llm(self, pdf_path: str) -> Dict[str, str]:
 		"""
 					purpose="user_data"
 				)
+			prompt =("""
+You are processing a legal PDF document (in Arabic) that has been uploaded as a file.
+Your task has TWO parts:
+1) TEXT EXTRACTION & CLEANING
+2) GLOBAL SUMMARY IN ARABIC
+========================
+1) TEXT EXTRACTION & CLEANING
+========================
+Extract ONLY the **main body text** of the entire document, in order, exactly as it appears logically in the statute, while cleaning away non-content noise.
+INCLUDE:
+- All legal text and provisions
+- Article numbers and titles
+- Section / chapter / part / الباب / الفصل headings
+- Numbered clauses, subclauses, bullet points
+- Any explanatory legal text that is part of the law itself
+EXCLUDE (REMOVE COMPLETELY):
+- Headers on each page (e.g., publication dates, التصنيف, نوع التشريع, حالة التشريع, etc.)
+- Footers on each page
+- Page numbers
+- Any repeated boilerplate that appears identically on each page
+- Scanning artifacts, junk characters, or layout noise
+- Empty or whitespace-only lines that are not meaningful
+IMPORTANT CLEANING RULES:
+- Preserve the original language (Arabic). Do NOT translate the law.
+- Preserve the logical order of the articles and sections as in the original law.
+- Do NOT paraphrase, shorten, summarize, or reword the legal text. Copy the body text as-is (except for removing headers/footers/page numbers and cleaning artifacts).
+- If the same header/footer text appears on many pages, remove all occurrences.
+- If you are unsure whether a short line is a page number or header/footer (e.g. just a digit or date in the margin), treat it as NON-content and remove it.
+- Keep reasonable line breaks and blank lines between titles, articles, and sections so the text is readable and structured, but do not insert additional commentary.
+- Do NOT invent or hallucinate any missing articles or text. Only use what is actually present in the PDF content.
+The final "text" field should contain the **full cleaned main body** of the law as ONE string, with newline characters where appropriate.
+========================
+2) GLOBAL SUMMARY (IN ARABIC)
+========================
+After extracting the cleaned body text, generate a **concise summary in Arabic** that:
+- Covers جميع الأبواب والفصول والمواد بشكل موجز
+- يوضح موضوع النظام، نطاق تطبيقه، وأهم الأحكام (مثل: الزواج، الحقوق والواجبات، النفقة، النسب، الفرقة، العدة، الحضانة، الوصاية، الولاية، الوصية، المفقود، إلخ)
+- يكون بصياغة عربية فصحى واضحة ومباشرة
+- يكون في بضع فقرات قصيرة أو قائمة نقاط موجزة (بدون إطالة مفرطة)
+لا تُدخل في الملخص أي تحليلات فقهية أو آراء، فقط وصف منظم لأهم الأحكام.
+REQUIREMENTS:
+- Do NOT wrap the JSON in Markdown.
+- Do NOT add any extra keys or metadata.
+- Do NOT add explanations before or after the JSON.
+- Ensure the JSON is valid and parseable (proper quotes, commas, and escaping).
+========================
+OUTPUT FORMAT (STRICT)
+========================
+Return ONLY a single JSON object, with EXACTLY these two fields:
+{
+  "text": "<the full cleaned main body text of the document as one string>",
+  "summary": "<the concise Arabic summary of the entire document>"
+} """)
 			# Use SDK responses API
 			response = self.client.responses.create(
 		# Load existing processed documents
 		existing_docs = []
+		existing_filenames = set()  # Original filenames for reference
+		existing_filenames_normalized = set()  # Normalized filenames for comparison
 		if skip_existing:
 			existing_docs = self.load_from_json()
+			for doc in existing_docs:
+				original_filename = doc.get("filename")
+				if original_filename:
+					original_filename = original_filename.strip()
+					normalized = self._normalize_filename(original_filename)
+					existing_filenames.add(original_filename)
+					existing_filenames_normalized.add(normalized)
 			if existing_filenames:
 				print(f"Found {len(existing_filenames)} already processed documents")
+				print(f"Existing filenames (original): {list(existing_filenames)}")
+				print(f"Existing filenames (normalized): {list(existing_filenames_normalized)}")
 		pdf_files = list(folder.glob("*.pdf"))
 		new_processed_docs = []
 		for pdf_file in pdf_files:
 			filename = pdf_file.name
+			filename_normalized = self._normalize_filename(filename)
+			# Debug: Print comparison attempt
+			print(f"[Filename Check] Checking: '{filename}' (normalized: '{filename_normalized}')")
+			# Skip if already processed (using normalized comparison)
+			if skip_existing and filename_normalized in existing_filenames_normalized:
 				print(f"⊘ Skipped (already processed): {filename}")
 				skipped_count += 1
 				continue
+			# Also check original filename for backward compatibility
+			if skip_existing and filename in existing_filenames:
+				print(f"⊘ Skipped (already processed, exact match): {filename}")
+				skipped_count += 1
+				continue
 			# Process new document
 			try:
 				result = self.process_pdf_with_llm(str(pdf_file))
 		if append and json_path.exists():
 			# Load existing and merge, avoiding duplicates
 			existing_docs = self.load_from_json(json_path)
+			existing_filenames = {doc.get("filename") for doc in existing_docs if doc.get("filename")}
+			existing_filenames_normalized = {self._normalize_filename(fn) for fn in existing_filenames}
+			# Add only new documents (using normalized comparison)
 			for doc in processed_docs:
+				doc_filename = doc.get("filename", "")
+				doc_filename_normalized = self._normalize_filename(doc_filename)
+				# Check both normalized and original for backward compatibility
+				if doc_filename not in existing_filenames and doc_filename_normalized not in existing_filenames_normalized:
 					existing_docs.append(doc)
 			processed_docs = existing_docs

backend/embeddings.py CHANGED Viewed

@@ -1,11 +1,15 @@
 import os
 import time
 import random
-from typing import List
 from pathlib import Path
 from dotenv import load_dotenv
 import httpx
 from openai import OpenAI
 def _chunk_list(items: List[str], chunk_size: int) -> List[List[str]]:
@@ -91,5 +95,169 @@ class OpenAIEmbeddingsWrapper:
 	def embed_documents(self, texts: List[str]) -> List[List[float]]:
 		return self._embed(texts)
 #شرح نظام الأحوال الشخصية

 import os
 import time
 import random
+from typing import List, Optional
 from pathlib import Path
 from dotenv import load_dotenv
 import httpx
 from openai import OpenAI
+try:
+	from huggingface_hub import InferenceClient
+except ImportError:
+	InferenceClient = None
 def _chunk_list(items: List[str], chunk_size: int) -> List[List[str]]:
 	def embed_documents(self, texts: List[str]) -> List[List[float]]:
 		return self._embed(texts)
+	def __call__(self, text: str) -> List[float]:
+		"""
+		Make the embeddings wrapper callable for compatibility with FAISS.
+		When FAISS calls the embeddings object directly, this delegates to embed_query.
+		"""
+		return self.embed_query(text)
+class HuggingFaceEmbeddingsWrapper:
+	"""
+	Embeddings wrapper compatible with LangChain's embeddings interface.
+	Uses HuggingFace InferenceClient with Nebius provider for embeddings.
+	Implements same interface as OpenAIEmbeddingsWrapper for drop-in replacement.
+	"""
+	def __init__(self, model: str = "Qwen/Qwen3-Embedding-8B", api_key: str | None = None, timeout: float = 60.0):
+		if InferenceClient is None:
+			raise ImportError("huggingface_hub is required for HuggingFace embeddings. Install it with: pip install huggingface_hub")
+		# Load .env from project root (one level up from backend/)
+		project_root = Path(__file__).resolve().parents[1]
+		load_dotenv(project_root / ".env")
+		self.model = model or os.getenv("HF_EMBEDDING_MODEL", "Qwen/Qwen3-Embedding-8B")
+		self.api_key = api_key or os.getenv("HF_TOKEN")
+		if not self.api_key:
+			raise ValueError("HF_TOKEN is required for HuggingFace embeddings. Set HF_TOKEN environment variable.")
+		# Timeout/backoff config
+		self.timeout = timeout
+		self.batch_size = int(os.getenv("HF_EMBED_BATCH_SIZE", "32"))  # Smaller batch size for HF
+		self.max_retries = int(os.getenv("HF_EMBED_MAX_RETRIES", "6"))
+		self.initial_backoff = float(os.getenv("HF_EMBED_INITIAL_BACKOFF", "1.0"))
+		self.backoff_multiplier = float(os.getenv("HF_EMBED_BACKOFF_MULTIPLIER", "2.0"))
+		# Initialize HuggingFace InferenceClient with Nebius provider
+		self.client = InferenceClient(
+			provider="nebius",
+			api_key=self.api_key
+		)
+		print(f"[HF Embeddings] Initialized with model: {self.model}, provider: nebius")
+	def _embed_once(self, inputs: List[str]) -> List[List[float]]:
+		"""Call HuggingFace feature_extraction API for a batch of texts"""
+		import numpy as np
+		# HuggingFace feature_extraction can handle single or batch inputs
+		if len(inputs) == 1:
+			# Single text
+			result = self.client.feature_extraction(inputs[0], model=self.model)
+			# Result is numpy.ndarray - convert to list
+			if isinstance(result, np.ndarray):
+				if result.ndim == 2:
+					# 2D array - extract first row
+					result = result[0].tolist()
+				else:
+					# 1D array - convert directly
+					result = result.tolist()
+			# Result is a list of floats (embedding vector)
+			return [result]
+		else:
+			# Batch processing - HF may support batch, but we'll process one by one for reliability
+			embeddings = []
+			for text in inputs:
+				result = self.client.feature_extraction(text, model=self.model)
+				# Convert numpy array to list if needed
+				if isinstance(result, np.ndarray):
+					if result.ndim == 2:
+						result = result[0].tolist()  # Extract first row if 2D
+					else:
+						result = result.tolist()
+				embeddings.append(result)
+			return embeddings
+	def _embed_with_retries(self, inputs: List[str]) -> List[List[float]]:
+		"""Embed with retry logic similar to OpenAI wrapper"""
+		attempt = 0
+		backoff = self.initial_backoff
+		while True:
+			try:
+				return self._embed_once(inputs)
+			except Exception as err:
+				status = None
+				try:
+					# Try to extract status code from error if available
+					status = getattr(getattr(err, "response", None), "status_code", None)
+				except Exception:
+					status = None
+				if (status in (429, 500, 502, 503, 504) or status is None) and attempt < self.max_retries:
+					retry_after = 0.0
+					try:
+						retry_after = float(getattr(getattr(err, "response", None), "headers", {}).get("Retry-After", 0))
+					except Exception:
+						retry_after = 0.0
+					jitter = random.uniform(0, 0.5)
+					sleep_s = max(retry_after, backoff) + jitter
+					time.sleep(sleep_s)
+					attempt += 1
+					backoff *= self.backoff_multiplier
+					continue
+				raise
+	def _embed(self, inputs: List[str]) -> List[List[float]]:
+		"""Process embeddings in batches with delays between batches"""
+		all_embeddings: List[List[float]] = []
+		for batch in _chunk_list(inputs, self.batch_size):
+			embeds = self._embed_with_retries(batch)
+			all_embeddings.extend(embeds)
+			# Small delay between batches to avoid rate limiting
+			time.sleep(float(os.getenv("HF_EMBED_INTER_BATCH_DELAY", "0.2")))
+		return all_embeddings
+	def embed_query(self, text: str) -> List[float]:
+		"""Embed a single query text"""
+		return self._embed([text])[0]
+	def embed_documents(self, texts: List[str]) -> List[List[float]]:
+		"""Embed multiple documents"""
+		return self._embed(texts)
+	def __call__(self, text: str) -> List[float]:
+		"""
+		Make the embeddings wrapper callable for compatibility with FAISS.
+		When FAISS calls the embeddings object directly, this delegates to embed_query.
+		"""
+		return self.embed_query(text)
+def get_embeddings_wrapper(
+	model: Optional[str] = None,
+	api_key: Optional[str] = None,
+	timeout: float = 30.0
+):
+	"""
+	Factory function to get the appropriate embeddings wrapper based on configuration.
+	Args:
+		model: Model name (provider-specific)
+		api_key: API key (provider-specific)
+		timeout: Timeout in seconds
+	Returns:
+		Either OpenAIEmbeddingsWrapper or HuggingFaceEmbeddingsWrapper instance
+	Environment Variables:
+		EMBEDDINGS_PROVIDER: "openai" (default), "huggingface", "hf", or "nebius"
+		HF_TOKEN: Required if using HuggingFace provider
+		HF_EMBEDDING_MODEL: Optional model override for HuggingFace (default: "Qwen/Qwen3-Embedding-8B")
+	"""
+	# Load .env from project root
+	project_root = Path(__file__).resolve().parents[1]
+	load_dotenv(project_root / ".env")
+	provider = os.getenv("EMBEDDINGS_PROVIDER", "hf").lower() #openai
+	if provider in ["huggingface", "hf", "nebius"]:
+		print(f"[Embeddings Factory] Using HuggingFace/Nebius provider")
+		hf_model = model or os.getenv("HF_EMBEDDING_MODEL", "Qwen/Qwen3-Embedding-8B")
+		return HuggingFaceEmbeddingsWrapper(model=hf_model, api_key=api_key, timeout=timeout)
+	else:
+		print(f"[Embeddings Factory] Using OpenAI provider (default)")
+		openai_model = model or os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-ada-002")
+		return OpenAIEmbeddingsWrapper(model=openai_model, api_key=api_key, timeout=timeout)
 #شرح نظام الأحوال الشخصية

backend/main.py CHANGED Viewed

@@ -104,6 +104,8 @@ app.add_middleware(
 class QuestionRequest(BaseModel):
     question: str
     use_history: Optional[bool] = True
 class QuestionResponse(BaseModel):
@@ -175,6 +177,9 @@ async def health():
 @app.post("/ask", response_model=QuestionResponse)
 async def ask_question(request: QuestionRequest):
     """Answer a question using RAG with multi-turn chat history"""
     global rag_system, rag_ready
     if rag_system is None or not rag_ready:
         raise HTTPException(
@@ -186,13 +191,14 @@ async def ask_question(request: QuestionRequest):
         raise HTTPException(status_code=400, detail="Question cannot be empty")
     try:
-        answer, sources = rag_system.answer_question(
             request.question,
             use_history=request.use_history,
-            model_provider="qwen",
         )
-        # print("[/ask] Qwen answer:", answer)
-        # print("[/ask] Sources:", sources)
         return QuestionResponse(answer=answer, sources=sources)
     except Exception as e:
         raise HTTPException(
@@ -229,12 +235,8 @@ async def get_document(filename: str, mode: str = Query("download", enum=["downl
     # If file doesn't exist, try to find it by matching actual files in directory
     if not file_path.exists():
-        print(f"[get_document] Document not found at direct path: {file_path}")
-        print(f"[get_document] Searching for filename: {decoded_filename}")
         # List all PDF files in documents directory
         actual_files = list(documents_dir.glob("*.pdf"))
-        print(f"[get_document] Found {len(actual_files)} PDF files in directory")
         # Normalize the requested filename for comparison
         def normalize_name(name: str) -> str:
@@ -253,24 +255,17 @@ async def get_document(filename: str, mode: str = Query("download", enum=["downl
             actual_name = actual_file.name
             actual_normalized = normalize_name(actual_name)
-            print(f"[get_document] Comparing: '{requested_normalized}' with '{actual_normalized}'")
             if requested_normalized == actual_normalized:
                 matched_file = actual_file
-                print(f"[get_document] Found match: {actual_file.name}")
                 break
         if matched_file:
             file_path = matched_file.resolve()
         else:
-            # Log all available files for debugging
-            print(f"[get_document] Available files in directory:")
-            for f in actual_files:
-                print(f"[get_document]   - {f.name}")
-            print(f"[get_document] Requested filename (normalized): {requested_normalized}")
             raise HTTPException(
                 status_code=404,
-                detail=f"Document not found: {decoded_filename}. Available files: {[f.name for f in actual_files]}"
             )
     file_extension = file_path.suffix.lower()
@@ -288,12 +283,35 @@ async def get_document(filename: str, mode: str = Query("download", enum=["downl
     if mode == "preview":
         if file_extension != ".pdf":
-            return JSONResponse({"filename": filename, "error": "Preview only available for PDF files"}, status_code=400)
         return FileResponse(
             str(file_path),
             media_type="application/pdf",
             filename=filename,
-            headers=build_headers("inline")
         )
     media_type = "application/pdf" if file_extension == ".pdf" else "application/octet-stream"

 class QuestionRequest(BaseModel):
     question: str
     use_history: Optional[bool] = True
+    context_mode: Optional[str] = "chunks"
+    model_provider: Optional[str] = "qwen" #qwen  openai or huggingface
 class QuestionResponse(BaseModel):
 @app.post("/ask", response_model=QuestionResponse)
 async def ask_question(request: QuestionRequest):
     """Answer a question using RAG with multi-turn chat history"""
+    import time
+    request_start = time.perf_counter()
     global rag_system, rag_ready
     if rag_system is None or not rag_ready:
         raise HTTPException(
         raise HTTPException(status_code=400, detail="Question cannot be empty")
     try:
+        answer, sources, _chunks = rag_system.answer_question(
             request.question,
             use_history=request.use_history,
+            model_provider=request.model_provider ,
+            context_mode=request.context_mode or "full",
         )
+        request_time = (time.perf_counter() - request_start) * 1000
+        print(f"[Timing] Total /ask endpoint time: {request_time:.2f}ms")
         return QuestionResponse(answer=answer, sources=sources)
     except Exception as e:
         raise HTTPException(
     # If file doesn't exist, try to find it by matching actual files in directory
     if not file_path.exists():
         # List all PDF files in documents directory
         actual_files = list(documents_dir.glob("*.pdf"))
         # Normalize the requested filename for comparison
         def normalize_name(name: str) -> str:
             actual_name = actual_file.name
             actual_normalized = normalize_name(actual_name)
             if requested_normalized == actual_normalized:
                 matched_file = actual_file
                 break
         if matched_file:
             file_path = matched_file.resolve()
         else:
+            error_detail = f"Document not found: '{decoded_filename}'. Available files: {[f.name for f in actual_files]}"
             raise HTTPException(
                 status_code=404,
+                detail=error_detail
             )
     file_extension = file_path.suffix.lower()
     if mode == "preview":
         if file_extension != ".pdf":
+            error_msg = f"Preview only available for PDF files. File extension: {file_extension}"
+            return JSONResponse({"filename": filename, "error": error_msg}, status_code=400)
+        # Verify file exists before returning
+        if not file_path.exists():
+            error_msg = f"File not found for preview: {file_path}"
+            raise HTTPException(status_code=404, detail=error_msg)
+        # Verify file is readable and not empty
+        try:
+            file_size = file_path.stat().st_size
+            if file_size == 0:
+                error_msg = f"File is empty: {file_path}"
+                raise HTTPException(status_code=400, detail=error_msg)
+        except Exception as e:
+            error_msg = f"Error accessing file: {str(e)}"
+            raise HTTPException(status_code=500, detail=error_msg)
+        # Build headers for preview (inline display)
+        preview_headers = build_headers("inline")
+        # Add CORS headers if needed
+        preview_headers["Access-Control-Allow-Origin"] = "*"
+        preview_headers["Access-Control-Expose-Headers"] = "Content-Disposition, Content-Type"
         return FileResponse(
             str(file_path),
             media_type="application/pdf",
             filename=filename,
+            headers=preview_headers
         )
     media_type = "application/pdf" if file_extension == ".pdf" else "application/octet-stream"

backend/rag_system.py CHANGED Viewed

@@ -1,15 +1,17 @@
 import os
 import json
 from pathlib import Path
 from typing import List, Tuple, Optional, Dict
 from langchain_community.vectorstores import FAISS
 from langchain.schema import Document
 try:
-	from backend.embeddings import OpenAIEmbeddingsWrapper
 	from backend.document_processor import DocumentProcessor
 	from backend.chat_history import ChatHistory
 except ModuleNotFoundError:
-	from embeddings import OpenAIEmbeddingsWrapper
 	from document_processor import DocumentProcessor
 	from chat_history import ChatHistory
 from openai import OpenAI
@@ -36,18 +38,29 @@ class RAGSystem:
 			self.json_path = json_path
 		self.vectorstore = None
-		# Initialize embeddings
-		api_key = openai_api_key or os.getenv("OPENAI_API_KEY")
-		if not api_key:
-			raise ValueError("OpenAI API key is required. Set OPENAI_API_KEY environment variable.")
-		self.embeddings = OpenAIEmbeddingsWrapper(api_key=api_key)
-		# Initialize document processor
-		self.processor = DocumentProcessor(api_key=api_key)
 		# Initialize LLM client for answering questions
-		os.environ.setdefault("OPENAI_API_KEY", api_key)
 		http_client = NoProxyHTTPClient(timeout=60.0)
 		self.llm_client = OpenAI(http_client=http_client)
 		self.llm_model = os.getenv("OPENAI_LLM_MODEL", "gpt-4o-mini")
@@ -55,6 +68,13 @@ class RAGSystem:
 		# Chat history manager
 		self.chat_history = ChatHistory(max_history=int(os.getenv("CHAT_HISTORY_TURNS", "10")))
 		# Try to load existing vectorstore
 		self._load_vectorstore()
@@ -67,8 +87,22 @@ class RAGSystem:
 					embeddings=self.embeddings,
 					allow_dangerous_deserialization=True
 				)
-				# Ensure embedding function is callable
-				self.vectorstore.embedding_function = self.embeddings.embed_query
 				print(f"Loaded existing vectorstore from {self.vectorstore_path}")
 			except Exception as e:
 				print(f"Could not load existing vectorstore: {e}")
@@ -171,7 +205,298 @@ class RAGSystem:
 		return len(new_processed_docs)
-	def answer_question(self, question: str, use_history: bool = True, model_provider: str = "openai") -> Tuple[str, List[str]]:
 		"""
 		Answer a question using RAG with multi-turn chat history
@@ -179,18 +504,33 @@ class RAGSystem:
 			question: The user's question
 			use_history: Whether to use chat history
 			model_provider: Model provider to use - "openai" (default) or "qwen"/"huggingface" for Qwen model
 		Returns:
-			Tuple of (answer, list of source filenames)
 		"""
 		if self.vectorstore is None:
 			raise ValueError("No documents indexed. Please process documents first.")
-		# Ensure embedding function is callable
-		if getattr(self.vectorstore, "embedding_function", None) is None or not callable(self.vectorstore.embedding_function):
-			self.vectorstore.embedding_function = self.embeddings.embed_query
-		# Step 1: Find most similar summary
 		# Build search query with last chat turn context if history is enabled
 		search_query = question
 		if use_history:
@@ -207,70 +547,181 @@ class RAGSystem:
 				# Combine with current question
 				search_query = f"{last_turn_text}\nCurrent Q: {question}"
-		similar_docs = self.vectorstore.similarity_search(search_query, k=1)
-		if not similar_docs:
-			return "I couldn't find any relevant information to answer your question.", []
-		# Step 2: Get filename from matched summary
-		matched_filename = similar_docs[0].metadata.get("filename", "")
-		if not matched_filename:
-			return "Error: No filename found in matched document metadata.", []
-		# Step 3: Retrieve full text from JSON
-		print(f"DEBUG: Searching for filename: '{matched_filename}'")
-		print(f"DEBUG: JSON path: {self.json_path}")
-		full_text = self.processor.get_text_by_filename(matched_filename, json_path=self.json_path)
 		if not full_text:
-			# Debug: Check what's in the JSON file
-			json_path = Path(self.json_path)
-			if not json_path.exists():
-				error_msg = f"Error: JSON file not found at {self.json_path}. Please process documents first."
-				print(f"DEBUG: {error_msg}")
-				return error_msg, [matched_filename]
-			try:
-				with open(json_path, "r", encoding="utf-8") as f:
-					docs = json.load(f)
-				available_filenames = [doc.get("filename", "unknown") for doc in docs] if isinstance(docs, list) else []
-				print(f"DEBUG: JSON file exists with {len(available_filenames) if isinstance(docs, list) else 0} documents")
-				print(f"DEBUG: Available filenames: {available_filenames}")
-				error_msg = f"Could not retrieve text for document: '{matched_filename}'. "
-				if available_filenames:
-					error_msg += f"Available filenames in JSON: {', '.join(available_filenames)}"
-				else:
-					error_msg += "JSON file is empty or invalid."
-				return error_msg, [matched_filename]
-			except Exception as e:
-				error_msg = f"Error loading JSON file: {str(e)}"
-				print(f"DEBUG: {error_msg}")
-				return error_msg, [matched_filename]
-		# Step 4: Build prompt with full text, question, and chat history
 		history_messages = []
 		if use_history:
 			# Get last 3 messages (get 2 turns = 4 messages, then take last 3)
 			history_messages = self.chat_history.get_recent_history(n_turns=2)
-		system_prompt = f"""You are a helpful legal document assistant. Answer questions based on the provided document text.
 MODE 1 - General Questions:
 - Understand the context and provide a clear, helpful answer
 - You may paraphrase or summarize information from the document
 - Explain concepts in your own words while staying true to the document's meaning
-- Whenever you refere to the document, refer to it by the filename WITHOUT the extension such as ".pdf" or".doc": {matched_filename}
 MODE 2 - Legal Articles/Terms (المادة):
-- When the user asks about specific legal articles (المادة), legal terms, or exact regulations, you MUST quote the EXACT text from the document verbatim
 - Copy the complete text word-for-word, including all numbers, punctuation, and formatting
-- Do NOT paraphrase, summarize, or generate new text for legal articles
 - NEVER create or generate legal text - only use what exists in the document
 If the answer is not in the document, say so clearly. MUST Answer in Arabic."""
 		# Check if question contains legal article/term keywords
@@ -278,17 +729,18 @@ If the answer is not in the document, say so clearly. MUST Answer in Arabic."""
 		legal_instruction = ""
 		if is_legal_term_question:
-			legal_instruction = "\n\nCRITICAL: The user is asking about a legal article or legal term. You MUST quote the EXACT text from the document verbatim. Copy the complete text word-for-word, including all numbers and punctuation. Do NOT paraphrase or generate any text."
 		else:
 			legal_instruction = "\n\nAnswer the question by understanding the context from the document. You may paraphrase or explain in your own words while staying true to the document's meaning."
-		user_prompt = f"""Document Text:
-{full_text[:8000]}  # Limit to avoid token limits
 User Question: {question}
 {legal_instruction}
-Please answer the question based on the document text above. MUST Answer the Question in Arabic"""
 		messages = [
 			{"role": "system", "content": system_prompt}
@@ -301,8 +753,11 @@ Please answer the question based on the document text above. MUST Answer the Que
 				messages.append(msg)
 		messages.append({"role": "user", "content": user_prompt})
 		# Step 5: Get answer from LLM
 		try:
 			# Initialize client based on model_provider
 			if model_provider.lower() in ["qwen", "huggingface"]:
@@ -323,28 +778,41 @@ Please answer the question based on the document text above. MUST Answer the Que
 				llm_client = self.llm_client
 				llm_model = self.llm_model
 			response = llm_client.chat.completions.create(
 				model=llm_model,
 				messages=messages,
 				temperature=0.3
 			)
-			answer = response.choices[0].message.content
 			# Filter thinking process from Qwen responses
 			if model_provider.lower() in ["qwen", "huggingface"]:
-				answer = self._filter_thinking_process(answer)
-			# Step 6: Update chat history
 			self.chat_history.add_message("user", question)
-			self.chat_history.add_message("assistant", answer)
-			return answer, [matched_filename]
 		except Exception as e:
 			error_msg = f"Error generating answer: {str(e)}"
 			self.chat_history.add_message("user", question)
-			self.chat_history.add_message("assistant", error_msg)
-			return error_msg, [matched_filename]
 	def clear_chat_history(self):
 		"""Clear chat history"""

 import os
 import json
+import time
 from pathlib import Path
 from typing import List, Tuple, Optional, Dict
 from langchain_community.vectorstores import FAISS
 from langchain.schema import Document
+from langchain.text_splitter import RecursiveCharacterTextSplitter
 try:
+	from backend.embeddings import get_embeddings_wrapper
 	from backend.document_processor import DocumentProcessor
 	from backend.chat_history import ChatHistory
 except ModuleNotFoundError:
+	from embeddings import get_embeddings_wrapper
 	from document_processor import DocumentProcessor
 	from chat_history import ChatHistory
 from openai import OpenAI
 			self.json_path = json_path
 		self.vectorstore = None
+		# Initialize embeddings (supports OpenAI or HuggingFace based on EMBEDDINGS_PROVIDER env var)
+		provider = os.getenv("EMBEDDINGS_PROVIDER", "openai").lower()
+		if provider in ["huggingface", "hf", "nebius"]:
+			# For HuggingFace, use HF_TOKEN
+			embeddings_api_key = os.getenv("HF_TOKEN")
+			if not embeddings_api_key:
+				raise ValueError("HF_TOKEN is required for HuggingFace embeddings. Set HF_TOKEN environment variable.")
+		else:
+			# For OpenAI, use OPENAI_API_KEY
+			embeddings_api_key = openai_api_key or os.getenv("OPENAI_API_KEY")
+			if not embeddings_api_key:
+				raise ValueError("OpenAI API key is required. Set OPENAI_API_KEY environment variable.")
+		self.embeddings = get_embeddings_wrapper(api_key=embeddings_api_key)
+		# Initialize document processor (always uses OpenAI for LLM processing)
+		openai_api_key_for_processor = openai_api_key or os.getenv("OPENAI_API_KEY")
+		if not openai_api_key_for_processor:
+			raise ValueError("OpenAI API key is required for document processing. Set OPENAI_API_KEY environment variable.")
+		self.processor = DocumentProcessor(api_key=openai_api_key_for_processor)
 		# Initialize LLM client for answering questions
+		os.environ.setdefault("OPENAI_API_KEY", openai_api_key_for_processor)
 		http_client = NoProxyHTTPClient(timeout=60.0)
 		self.llm_client = OpenAI(http_client=http_client)
 		self.llm_model = os.getenv("OPENAI_LLM_MODEL", "gpt-4o-mini")
 		# Chat history manager
 		self.chat_history = ChatHistory(max_history=int(os.getenv("CHAT_HISTORY_TURNS", "10")))
+		# Cache for JSON file contents and document texts
+		self._json_cache = None
+		self._json_cache_path = None
+		self._text_cache: Dict[str, str] = {}  # Cache for document texts by filename
+		# Cache for per-document chunk vectorstores: {filename: {\"vectorstore\": FAISS, \"chunks\": List[Document]}}
+		self._chunk_cache: Dict[str, Dict[str, object]] = {}
 		# Try to load existing vectorstore
 		self._load_vectorstore()
 					embeddings=self.embeddings,
 					allow_dangerous_deserialization=True
 				)
+				# Ensure embedding function is properly set
+				# FAISS may use either embedding_function attribute or call embeddings directly
+				# Set embedding_function to the embed_query method for compatibility
+				if not hasattr(self.vectorstore, 'embedding_function') or self.vectorstore.embedding_function is None:
+					self.vectorstore.embedding_function = self.embeddings.embed_query
+				elif not callable(self.vectorstore.embedding_function):
+					self.vectorstore.embedding_function = self.embeddings.embed_query
+				# Also ensure the embeddings object itself is accessible and callable
+				# This handles cases where FAISS tries to call the embeddings object directly
+				if hasattr(self.vectorstore, 'embeddings'):
+					self.vectorstore.embeddings = self.embeddings
+				# Verify embedding function is working
+				if not callable(self.vectorstore.embedding_function):
+					raise ValueError("Embedding function is not callable after initialization")
 				print(f"Loaded existing vectorstore from {self.vectorstore_path}")
 			except Exception as e:
 				print(f"Could not load existing vectorstore: {e}")
 		return len(new_processed_docs)
+	@staticmethod
+	def _parse_llm_response(raw_response: str) -> str:
+		"""
+		Parse LLM response to extract answer.
+		Args:
+			raw_response: The raw response from LLM
+		Returns:
+			The answer text
+		"""
+		# Try to parse as JSON first
+		try:
+			# Look for JSON in the response (might be wrapped in markdown code blocks)
+			response_text = raw_response.strip()
+			# Remove markdown code blocks if present
+			if response_text.startswith("```json"):
+				response_text = response_text[7:]  # Remove ```json
+			elif response_text.startswith("```"):
+				response_text = response_text[3:]  # Remove ```
+			if response_text.endswith("```"):
+				response_text = response_text[:-3]  # Remove closing ```
+			response_text = response_text.strip()
+			# Try to parse as JSON
+			parsed = json.loads(response_text)
+			answer = parsed.get("answer", raw_response)
+			return answer
+		except (json.JSONDecodeError, ValueError) as e:
+			# If JSON parsing fails, return the raw response
+			return raw_response
+	def _load_json_cached(self) -> List[Dict[str, str]]:
+		"""Load JSON file with caching to avoid repeated file I/O"""
+		json_path = Path(self.json_path)
+		# Check if cache is valid (file hasn't changed)
+		if self._json_cache is not None and self._json_cache_path == str(json_path):
+			if json_path.exists():
+				# Check if file modification time changed
+				current_mtime = json_path.stat().st_mtime
+				if hasattr(self, '_json_cache_mtime') and self._json_cache_mtime == current_mtime:
+					return self._json_cache
+		# Load from file
+		if not json_path.exists():
+			return []
+		try:
+			with open(json_path, "r", encoding="utf-8") as f:
+				docs = json.load(f)
+			# Cache the results
+			self._json_cache = docs if isinstance(docs, list) else []
+			self._json_cache_path = str(json_path)
+			self._json_cache_mtime = json_path.stat().st_mtime
+			return self._json_cache
+		except Exception as e:
+			return []
+	def _get_text_by_filename_cached(self, filename: str) -> Optional[str]:
+		"""Get full text for a document by filename using cache"""
+		# Check text cache first
+		if filename in self._text_cache:
+			return self._text_cache[filename]
+		# Load from JSON cache
+		docs = self._load_json_cached()
+		for doc in docs:
+			if doc.get("filename") == filename:
+				text = doc.get("text", "")
+				# Cache the text
+				self._text_cache[filename] = text
+				return text
+		return None
+	def _get_or_build_chunk_vectorstore(
+		self,
+		filename: str,
+		full_text: str,
+		chunk_size: int = 2000,
+		chunk_overlap: int = 300
+	) -> Tuple[FAISS, List[Document]]:
+		"""
+		Build or retrieve an in-memory FAISS vectorstore of semantic chunks for a single document.
+		Args:
+			filename: Document filename used as key in cache/metadata
+			full_text: Full document text to chunk
+			chunk_size: Approximate character length for each chunk
+			chunk_overlap: Overlap between consecutive chunks (characters)
+		Returns:
+			Tuple of (FAISS vectorstore over chunks, list of chunk Documents)
+		"""
+		# Return from cache if available
+		if filename in self._chunk_cache:
+			entry = self._chunk_cache[filename]
+			return entry["vectorstore"], entry["chunks"]  # type: ignore[return-value]
+		# Create text splitter tuned for Arabic legal text
+		text_splitter = RecursiveCharacterTextSplitter(
+			chunk_size=chunk_size,
+			chunk_overlap=chunk_overlap,
+			separators=[
+				"\n\n",
+				"\n",
+				"المادة ",
+				"مادة ",
+				". ",
+				" ",
+				""
+			],
+		)
+		chunks = text_splitter.split_text(full_text)
+		chunk_docs: List[Document] = []
+		for idx, chunk in enumerate(chunks):
+			chunk_docs.append(
+				Document(
+					page_content=chunk,
+					metadata={
+						"filename": filename,
+						"chunk_index": idx,
+					},
+				)
+			)
+		if not chunk_docs:
+			# Fallback: single chunk with entire text
+			chunk_docs = [
+				Document(
+					page_content=full_text,
+					metadata={
+						"filename": filename,
+						"chunk_index": 0,
+					},
+				)
+			]
+		chunk_vectorstore = FAISS.from_documents(chunk_docs, embedding=self.embeddings)
+		self._chunk_cache[filename] = {
+			"vectorstore": chunk_vectorstore,
+			"chunks": chunk_docs,
+		}
+		return chunk_vectorstore, chunk_docs
+	def _classify_question(self, question: str, use_history: bool = True, model_provider: str = "openai") -> Tuple[str, Optional[str], Optional[List[str]], Optional[List[str]]]:
+		"""
+		Classify question into one of three categories: law-new, law-followup, or general.
+		Args:
+			question: The user's question
+			use_history: Whether to use chat history
+			model_provider: Model provider to use
+		Returns:
+			Tuple of (label, answer, sources, chunks) where:
+			- label: "law-new", "law-followup", or "general"
+			- For "general": answer contains the answer string, sources=[], chunks=None
+			- For "law-new" or "law-followup": answer=None, sources=None, chunks=None (RAG will handle answering)
+		"""
+		# Get previous turn context for distinguishing law-new from law-followup
+		previous_context = ""
+		if use_history:
+			last_turn = self.chat_history.get_last_turn()
+			if last_turn and len(last_turn) >= 2:
+				prev_user = last_turn[0].get("content", "") if last_turn[0].get("role") == "user" else ""
+				prev_assistant = last_turn[1].get("content", "") if last_turn[1].get("role") == "assistant" else ""
+				if prev_user and prev_assistant:
+					previous_context = f"\n\nPrevious conversation:\nUser: {prev_user}\nAssistant: {prev_assistant}"
+		classification_prompt = f"""Classify the following question as one of: "law-new", "law-followup", or "general".
+A "law-new" question is:
+- A law-related question that starts a new topic/thread
+- Not primarily dependent on the immediately previous answer
+- About legal documents, regulations, laws, articles (المادة), legal cases, procedures, terms, definitions
+- Anything related to legal matters in documents, but as a new inquiry
+A "law-followup" question is:
+- A law-related question that is a follow-up, inference, or clarification based on the previous assistant response
+- Refers to or builds upon the previous answer (e.g., "what about...", "can you explain more about...", "based on that...", "how about...", "what if...")
+- Asks for clarification, elaboration, or related information about what was just discussed
+- Continues the conversation thread about the same legal topic
+- Uses pronouns or references that relate to the previous response
+A "general" question is:
+- Greetings (السلام عليكم, مرحبا, etc.)
+- Casual conversation
+- Questions not related to legal documents or law
+{previous_context}
+Current Question: {question}
+If the question is "general", provide a helpful answer in Arabic.
+If the question is "law-new", respond with only "law-new".
+If the question is "law-followup", respond with only "law-followup".
+"""
+		try:
+			# Initialize client based on model_provider
+			if model_provider.lower() in ["qwen", "huggingface"]:
+				hf_token = os.getenv("HF_TOKEN")
+				if not hf_token:
+					# Fallback to OpenAI if HF_TOKEN not available
+					llm_client = self.llm_client
+					llm_model = self.llm_model
+				else:
+					http_client = NoProxyHTTPClient(timeout=60.0)
+					llm_client = OpenAI(
+						base_url="https://router.huggingface.co/v1",
+						api_key=hf_token,
+						http_client=http_client
+					)
+					llm_model = os.getenv("QWEN_MODEL", "Qwen/Qwen3-32B:nscale")
+			else:
+				llm_client = self.llm_client
+				llm_model = self.llm_model
+			# Build messages with chat history if enabled
+			history_messages = []
+			if use_history:
+				history_messages = self.chat_history.get_recent_history(n_turns=2)
+			system_prompt = """You are a helpful assistant. Classify questions into one of three categories and answer general questions in Arabic.
+If the question is a greeting or general question, provide a friendly, helpful answer in Arabic.
+If the question is law-related and starts a new topic, respond with only "law-new".
+If the question is law-related and is a follow-up to the previous response, respond with only "law-followup".
+Respond with ONLY one of: "law-new", "law-followup", or provide an answer if it's general."""
+			messages = [{"role": "system", "content": system_prompt}]
+			# Add chat history
+			if history_messages:
+				for msg in history_messages[:-1] if len(history_messages) > 0 and history_messages[-1].get("content") == question else history_messages:
+					messages.append(msg)
+			messages.append({"role": "user", "content": classification_prompt})
+			response = llm_client.chat.completions.create(
+				model=llm_model,
+				messages=messages,
+				temperature=0.3
+			)
+			raw_response = response.choices[0].message.content.strip()
+			# Filter thinking process from Qwen responses
+			if model_provider.lower() in ["qwen", "huggingface"]:
+				raw_response = self._filter_thinking_process(raw_response)
+			# Check classification result
+			response_lower = raw_response.lower().strip()
+			is_law_new = "law-new" in response_lower and len(response_lower) < 20
+			is_law_followup = "law-followup" in response_lower and len(response_lower) < 20
+			if is_law_new:
+				print(f"[Classification] Question classified as: law-new")
+				return ("law-new", None, None, None)  # Continue with RAG flow
+			elif is_law_followup:
+				print(f"[Classification] Question classified as: law-followup")
+				return ("law-followup", None, None, None)  # Continue with RAG flow, will reuse chunks if available
+			else:
+				# General question - use the response as answer
+				answer = self._parse_llm_response(raw_response)
+				# Update chat history
+				self.chat_history.add_message("user", question)
+				self.chat_history.add_message("assistant", answer)
+				print(f"[Classification] Question classified as: general, answered directly")
+				return ("general", answer, [], None)  # Return answer with empty sources and no chunks
+		except Exception as e:
+			# On error, default to law-new to use RAG flow
+			print(f"[Classification] Error classifying question, defaulting to law-new: {e}")
+			return ("law-new", None, None, None)
+	def answer_question(
+		self,
+		question: str,
+		use_history: bool = True,
+		model_provider: str = "openai",
+		context_mode: str = "full",
+	) -> Tuple[str, List[str], Optional[List[str]]]:
 		"""
 		Answer a question using RAG with multi-turn chat history
 			question: The user's question
 			use_history: Whether to use chat history
 			model_provider: Model provider to use - "openai" (default) or "qwen"/"huggingface" for Qwen model
+			context_mode: Context construction mode - "full" (entire document) or "chunks" (top semantic chunks)
 		Returns:
+			Tuple of (answer, list of source filenames, optional list of chunk texts for testing)
 		"""
+		start_time = time.perf_counter()
 		if self.vectorstore is None:
 			raise ValueError("No documents indexed. Please process documents first.")
+		# Step 0: Classify question into law-new, law-followup, or general
+		classification_start = time.perf_counter()
+		label, answer, sources, chunks = self._classify_question(question, use_history, model_provider)
+		classification_time = (time.perf_counter() - classification_start) * 1000
+		print(f"[Timing] Question classification: {classification_time:.2f}ms")
+		# If general question was handled, return the result immediately
+		if label == "general":
+			return answer, sources, chunks
+		# Step 1: Find most similar summary (law-related questions only)
+		# Check if there's a previous document to potentially reuse
+		search_start = time.perf_counter()
+		previous_document = None
+		if use_history:
+			previous_document = self.chat_history.get_last_document()
 		# Build search query with last chat turn context if history is enabled
 		search_query = question
 		if use_history:
 				# Combine with current question
 				search_query = f"{last_turn_text}\nCurrent Q: {question}"
+		# Perform similarity search with scores for relevance checking
+		# Use k=3 to get multiple candidates for comparison
+		similar_docs_with_scores = self.vectorstore.similarity_search_with_score(search_query, k=3)
+		search_time = (time.perf_counter() - search_start) * 1000
+		print(f"[Timing] Similarity search: {search_time:.2f}ms")
+		if not similar_docs_with_scores:
+			return "I couldn't find any relevant information to answer your question.", [], None
+		# Extract best matching document and score
+		best_doc, best_score = similar_docs_with_scores[0]
+		best_filename = best_doc.metadata.get("filename", "")
+		# Step 2: Check if we should reuse previous document
+		matched_filename = best_filename
+		if previous_document and use_history:
+			# Check if previous document is in the search results
+			previous_doc_found = False
+			previous_doc_score = None
+			for doc, score in similar_docs_with_scores:
+				filename = doc.metadata.get("filename", "")
+				if filename == previous_document:
+					previous_doc_found = True
+					previous_doc_score = score
+					break
+			if previous_doc_found and previous_doc_score is not None:
+				# Check if previous document score is close to best score
+				# FAISS returns distance scores (lower is better), so we compare the difference
+				score_difference = abs(best_score - previous_doc_score)
+				# If difference is small (within 0.15), reuse previous document
+				# This threshold can be adjusted based on testing
+				relevance_threshold = 0.15
+				if score_difference <= relevance_threshold:
+					matched_filename = previous_document
+					print(f"[RAG] Reusing previous document: {matched_filename} (score diff: {score_difference:.4f})")
+				else:
+					print(f"[RAG] Previous document less relevant, using best match: {best_filename} (score diff: {score_difference:.4f})")
+			else:
+				print(f"[RAG] Previous document not in top results, using best match: {best_filename}")
+		# Get the matched document object
+		matched_doc = None
+		for doc, _ in similar_docs_with_scores:
+			if doc.metadata.get("filename", "") == matched_filename:
+				matched_doc = doc
+				break
+		# If matched document not found in results (shouldn't happen), use best match
+		if matched_doc is None:
+			matched_doc = best_doc
+			matched_filename = best_filename
+		# Print the filename and most similar summary
+		print(f"[RAG] Matched filename: {matched_filename}")
+		if not matched_filename:
+			return "Error: No filename found in matched document metadata.", [], None
+		# Step 3: Retrieve full text from JSON (with caching)
+		retrieval_start = time.perf_counter()
+		full_text = self._get_text_by_filename_cached(matched_filename)
+		retrieval_time = (time.perf_counter() - retrieval_start) * 1000
+		print(f"[Timing] Text retrieval from JSON: {retrieval_time:.2f}ms")
 		if not full_text:
+			# Load JSON to get available filenames for error message
+			docs = self._load_json_cached()
+			available_filenames = [doc.get("filename", "unknown") for doc in docs] if isinstance(docs, list) else []
+			error_msg = f"Could not retrieve text for document: '{matched_filename}'. "
+			if available_filenames:
+				error_msg += f"Available filenames in JSON: {', '.join(available_filenames)}"
+			else:
+				error_msg += "JSON file is empty or invalid."
+			return error_msg, [matched_filename], None
+		# Step 4: Build context (full document or top semantic chunks), prompt, and chat history
+		prompt_start = time.perf_counter()
 		history_messages = []
 		if use_history:
 			# Get last 3 messages (get 2 turns = 4 messages, then take last 3)
 			history_messages = self.chat_history.get_recent_history(n_turns=2)
+		# Decide how to construct document context for the LLM
+		context_mode_normalized = (context_mode or "full").lower()
+		if context_mode_normalized not in ["full", "chunks"]:
+			context_mode_normalized = "full"
+		# Default: use full document text (truncated)
+		document_context_label = "Document Text"
+		selected_chunks: Optional[List[str]] = None  # Store chunks for return to frontend
+		if context_mode_normalized == "full":
+			print(f"[RAG] full mode ...")
+			document_context = full_text[:16000]  # Limit to avoid token limits
+		else:
+			print(f"[RAG] Chunk mode ...")
+			# Check if we should reuse previous chunks (only for law-followup AND same document)
+			previous_chunks = None
+			if label == "law-followup" and use_history:
+				previous_chunks = self.chat_history.get_last_chunks()
+				previous_doc = self.chat_history.get_last_document()
+				if previous_chunks and previous_doc == matched_filename:
+					print(f"[RAG] Reusing previous chunks for law-followup question ({len(previous_chunks)} chunks)")
+					selected_chunks = previous_chunks  # Reuse previous chunks
+					document_context_label = "Selected Document Excerpts"
+					chunk_texts: List[str] = []
+					for idx, chunk_text in enumerate(previous_chunks, start=1):
+						chunk_texts.append(f"[مقطع {idx}]\n{chunk_text}")
+					document_context = "\n\n".join(chunk_texts)[:25000]
+				else:
+					previous_chunks = None  # Can't reuse, do new search
+					print(f"[RAG] Cannot reuse chunks: law-followup but different document or no previous chunks")
+			# If not reusing previous chunks, do normal chunk search (for law-new or when reuse not possible)
+			if previous_chunks is None:
+				# Chunk mode: build or load per-document chunk vectorstore and retrieve top-k chunks
+				chunk_vs, _ = self._get_or_build_chunk_vectorstore(matched_filename, full_text)
+				# Use the current question as the chunk search query
+				# (we already used enriched search_query for document selection)
+				top_k = 4
+				try:
+					top_chunks = chunk_vs.similarity_search(question, k=top_k)
+				except Exception as e:
+					print(f"[RAG] Chunk similarity search failed for {matched_filename}, falling back to full text: {e}")
+					document_context = full_text[:25000]
+					context_mode_normalized = "full"
+				else:
+					if not top_chunks:
+						print(f"[RAG] No chunks returned for {matched_filename}, falling back to full text")
+						document_context = full_text[:8000]
+						context_mode_normalized = "full"
+					else:
+						document_context_label = "Selected Document Excerpts"
+						chunk_texts: List[str] = []
+						selected_chunks = []  # Store raw chunk texts for return
+						for idx, doc in enumerate(top_chunks, start=1):
+							chunk_text = doc.page_content
+							selected_chunks.append(chunk_text)  # Store raw chunk text
+							chunk_texts.append(f"[مقطع {idx}]\n{chunk_text}")
+						document_context = "\n\n".join(chunk_texts)[:20000]
+		# Build prompts
+		mode_note = ""
+		if context_mode_normalized == "chunks":
+			mode_note = (
+				"\n\nNote: The provided document text consists of selected relevant excerpts (مقاطع) "
+				"from the same document, not the full law. Answer strictly based on these excerpts."
+			)
+		system_prompt = f"""You are a helpful legal document assistant. Answer questions based on the provided document text. {mode_note}
 MODE 1 - General Questions:
 - Understand the context and provide a clear, helpful answer
 - You may paraphrase or summarize information from the document
 - Explain concepts in your own words while staying true to the document's meaning
 MODE 2 - Legal Articles/Terms (المادة):
+- When the user asks about specific legal articles (المادة), legal terms, or exact regulations, you MUST quote the EXACT text from the document (con
+text) verbatim
 - Copy the complete text word-for-word, including all numbers, punctuation, and formatting
+- Do NOT paraphrase, summarize, or generate new text for legal articles (المادة)
 - NEVER create or generate legal text - only use what exists in the document
+IMPORTANT - Response Format:
+- Do NOT include source citations in your answer (e.g., do NOT write "المصدر: نظام الاحوال الشخصية.pdf" or similar source references)
+- Do NOT mention the document filename or source at the end of your answer
+- Simply provide the answer directly without any source attribution
+- Whenever you refere to the document (context or filename) in response, refer to it by the filename WITHOUT the extension such as ".pdf" or".doc"
 If the answer is not in the document, say so clearly. MUST Answer in Arabic."""
 		# Check if question contains legal article/term keywords
 		legal_instruction = ""
 		if is_legal_term_question:
+			legal_instruction = "\n\nCRITICAL: The user is asking about a legal article or legal term. Carefully search the provided context to find the relevant article. Reference the article correctly as it has been stated in the context. Articles might be referenced by their content, position, or topic - for example, 'المادة الأولى' might refer to the first article in a section even if not explicitly numbered. Find and quote the relevant text accurately from the document, maintaining the exact wording as it appears. Do NOT create or generate legal text - only use what exists in the document."
 		else:
 			legal_instruction = "\n\nAnswer the question by understanding the context from the document. You may paraphrase or explain in your own words while staying true to the document's meaning."
+		user_prompt = f"""{document_context_label}:
+{document_context}
 User Question: {question}
 {legal_instruction}
+Please answer the question based on the document text above.
+MUST Answer the Question in Arabic."""
 		messages = [
 			{"role": "system", "content": system_prompt}
 				messages.append(msg)
 		messages.append({"role": "user", "content": user_prompt})
+		prompt_time = (time.perf_counter() - prompt_start) * 1000
+		print(f"[Timing] Prompt construction: {prompt_time:.2f}ms")
 		# Step 5: Get answer from LLM
+		llm_start = time.perf_counter()
 		try:
 			# Initialize client based on model_provider
 			if model_provider.lower() in ["qwen", "huggingface"]:
 				llm_client = self.llm_client
 				llm_model = self.llm_model
+			# Get answer from LLM (non-streaming)
 			response = llm_client.chat.completions.create(
 				model=llm_model,
 				messages=messages,
 				temperature=0.3
 			)
+			raw_response = response.choices[0].message.content
+			llm_time = (time.perf_counter() - llm_start) * 1000
+			print(f"[Timing] LLM API call: {llm_time:.2f}ms")
 			# Filter thinking process from Qwen responses
 			if model_provider.lower() in ["qwen", "huggingface"]:
+				raw_response = self._filter_thinking_process(raw_response)
+			# Step 6: Parse LLM response to extract answer
+			parse_start = time.perf_counter()
+			answer = self._parse_llm_response(raw_response)
+			parse_time = (time.perf_counter() - parse_start) * 1000
+			print(f"[Timing] Response parsing: {parse_time:.2f}ms")
+			# Step 7: Update chat history with document source and chunks
 			self.chat_history.add_message("user", question)
+			self.chat_history.add_message("assistant", answer, source_document=matched_filename, chunks=selected_chunks)
+			total_time = (time.perf_counter() - start_time) * 1000
+			print(f"[Timing] Total inference time: {total_time:.2f}ms")
+			return answer, [matched_filename], selected_chunks
 		except Exception as e:
+			total_time = (time.perf_counter() - start_time) * 1000
+			print(f"[Timing] Total inference time (error): {total_time:.2f}ms")
 			error_msg = f"Error generating answer: {str(e)}"
 			self.chat_history.add_message("user", question)
+			self.chat_history.add_message("assistant", error_msg, source_document=matched_filename, chunks=None)
+			return error_msg, [matched_filename], None
 	def clear_chat_history(self):
 		"""Clear chat history"""

documents/شرح نظام الأحوال الشخصية.pdf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2322f04afd2f881ad7118847022039c633d30e3552b84947f0ef81d3702fe444
-size 2779178

files_upload.py DELETED Viewed

@@ -1,42 +0,0 @@
-from huggingface_hub import HfApi
-from pathlib import Path
-import os
-from dotenv import load_dotenv
-# Load environment variables from .env if present
-load_dotenv()
-# Get token from environment variable (more secure)
-token = os.getenv("HF_TOKEN")
-if not token:
-    raise ValueError("HF_TOKEN environment variable not set. Set it with: export HF_TOKEN='your_token_here'")
-# Initialize API with token
-api = HfApi(token=token)
-repo_id = "AldawsariNLP/Saudi-Law-AI-Assistant"
-# Upload all PDFs from local documents folder
-local_docs = Path("documents")
-pdf_files = list(local_docs.glob("*.pdf"))
-if not pdf_files:
-    print("No PDF files found in documents/ folder; skipping upload.")
-    exit(0)
-print(f"Found {len(pdf_files)} PDF file(s) to upload")
-for pdf_file in pdf_files:
-    print(f"Uploading {pdf_file.name}...")
-    try:
-        api.upload_file(
-            path_or_fileobj=str(pdf_file),
-            path_in_repo=f"documents/{pdf_file.name}",
-            repo_id=repo_id,
-            repo_type="space",
-            token=token,  # Also pass token here for safety
-        )
-        print(f"✓ Successfully uploaded {pdf_file.name}")
-    except Exception as e:
-        print(f"✗ Failed to upload {pdf_file.name}: {e}")
-        raise
-print("Upload complete!")

frontend/src/App.js CHANGED Viewed

@@ -40,7 +40,12 @@ function App() {
     e.preventDefault();
     if (!input.trim() || loading) return;
-    const userMessage = { role: 'user', content: input };
     setMessages(prev => [...prev, userMessage]);
     setInput('');
     setLoading(true);
@@ -51,15 +56,17 @@ function App() {
       });
       const assistantMessage = {
         role: 'assistant',
         content: response.data.answer,
-        sources: response.data.sources
       };
       setMessages(prev => [...prev, assistantMessage]);
     } catch (error) {
       const errorMessage = {
         role: 'assistant',
-        content: error.response?.data?.detail || 'عذراً، حدث خطأ. يرجى المحاولة مرة أخرى.',
         error: true
       };
       setMessages(prev => [...prev, errorMessage]);
@@ -98,10 +105,21 @@ function App() {
     setPreviewLoading(false);
   };
   const handleSourceClick = async (source) => {
     if (!source) return;
     const filename = source.split('/').pop() || source;
     const extension = filename.split('.').pop()?.toLowerCase();
     setPreviewFilename(filename);
     setPreviewError(null);
     setPreviewLoading(true);
@@ -111,19 +129,51 @@ function App() {
     }
     setPreviewUrl(null);
     if (extension !== 'pdf') {
-      setPreviewError('المعاينة متاحة فقط لملفات PDF.');
       setPreviewLoading(false);
       return;
     }
     try {
       const url = `${DOCS_URL}/${encodeURIComponent(filename)}?mode=preview`;
-      const response = await axios.get(url, { responseType: 'blob' });
       const blob = new Blob([response.data], { type: 'application/pdf' });
       const objectUrl = URL.createObjectURL(blob);
       previewUrlRef.current = objectUrl;
       setPreviewUrl(objectUrl);
     } catch (error) {
-      setPreviewError(error.response?.data?.detail || 'تعذر تحميل المعاينة.');
     } finally {
       setPreviewLoading(false);
     }
@@ -196,10 +246,10 @@ function App() {
               }
             };
             return (
-              <div key={idx} className={`message ${msg.role}`}>
                 <div className="message-content">
                   <div className="message-header">
-                    {msg.role === 'user' ? '👤 أنت' : '🤖 المساعد'}
                   </div>
                   <div className={`message-text ${msg.error ? 'error' : ''}`}>
                     {renderContent()}
@@ -210,7 +260,7 @@ function App() {
                       <ul>
                         {msg.sources.map((source, i) => (
                           <li key={i}>
-                            <span className="source-name">{source.split('/').pop()}</span>
                             <div className="source-actions">
                               <button
                                 type="button"

     e.preventDefault();
     if (!input.trim() || loading) return;
+    // Use unique IDs to prevent collision
+    const baseTime = Date.now();
+    const userMessageId = baseTime;
+    const assistantMessageId = baseTime + 1; // Ensure different ID
+    const userMessage = { id: userMessageId, role: 'user', content: input };
     setMessages(prev => [...prev, userMessage]);
     setInput('');
     setLoading(true);
       });
       const assistantMessage = {
+        id: assistantMessageId,
         role: 'assistant',
         content: response.data.answer,
+        sources: response.data.sources || []
       };
       setMessages(prev => [...prev, assistantMessage]);
     } catch (error) {
       const errorMessage = {
+        id: assistantMessageId,
         role: 'assistant',
+        content: error.response?.data?.detail || error.message || 'عذراً، حدث خطأ. يرجى المحاولة مرة أخرى.',
         error: true
       };
       setMessages(prev => [...prev, errorMessage]);
     setPreviewLoading(false);
   };
+  const getDisplaySourceName = (source) => {
+    if (!source) return '';
+    const fullName = source.split('/').pop() || source;
+    const lastDot = fullName.lastIndexOf('.');
+    if (lastDot > 0) {
+      return fullName.substring(0, lastDot);
+    }
+    return fullName;
+  };
   const handleSourceClick = async (source) => {
     if (!source) return;
     const filename = source.split('/').pop() || source;
     const extension = filename.split('.').pop()?.toLowerCase();
+    console.log('[Preview] Requesting preview for:', filename);
     setPreviewFilename(filename);
     setPreviewError(null);
     setPreviewLoading(true);
     }
     setPreviewUrl(null);
     if (extension !== 'pdf') {
+      const errorMsg = 'المعاينة متاحة فقط لملفات PDF.';
+      console.error('[Preview] Error:', errorMsg);
+      setPreviewError(errorMsg);
       setPreviewLoading(false);
       return;
     }
     try {
       const url = `${DOCS_URL}/${encodeURIComponent(filename)}?mode=preview`;
+      console.log('[Preview] Requesting URL:', url);
+      const response = await axios.get(url, {
+        responseType: 'blob',
+        timeout: 30000 // 30 second timeout
+      });
+      console.log('[Preview] Response received, status:', response.status, 'size:', response.data.size);
+      if (!response.data || response.data.size === 0) {
+        throw new Error('Received empty file');
+      }
       const blob = new Blob([response.data], { type: 'application/pdf' });
       const objectUrl = URL.createObjectURL(blob);
       previewUrlRef.current = objectUrl;
       setPreviewUrl(objectUrl);
+      console.log('[Preview] Successfully created object URL');
     } catch (error) {
+      console.error('[Preview] Error details:', {
+        message: error.message,
+        response: error.response?.data,
+        status: error.response?.status,
+        statusText: error.response?.statusText,
+        url: error.config?.url
+      });
+      let errorMsg = 'تعذر تحميل المعاينة.';
+      if (error.response?.data?.detail) {
+        errorMsg = `خطأ: ${error.response.data.detail}`;
+      } else if (error.response?.status === 404) {
+        errorMsg = 'الملف غير موجود.';
+      } else if (error.response?.status === 403) {
+        errorMsg = 'غير مسموح بالوصول إلى هذا الملف.';
+      } else if (error.message) {
+        errorMsg = `خطأ: ${error.message}`;
+      }
+      setPreviewError(errorMsg);
     } finally {
       setPreviewLoading(false);
     }
               }
             };
             return (
+              <div key={msg.id || idx} className={`message ${msg.role}`}>
                 <div className="message-content">
                   <div className="message-header">
+                    {msg.role === 'user' ? '👤 أنت' : '🤖 المساعد القانوني'}
                   </div>
                   <div className={`message-text ${msg.error ? 'error' : ''}`}>
                     {renderContent()}
                       <ul>
                         {msg.sources.map((source, i) => (
                           <li key={i}>
+                            <span className="source-name">{getDisplaySourceName(source)}</span>
                             <div className="source-actions">
                               <button
                                 type="button"

processed_documents.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

test_nebius_embeddings.py ADDED Viewed

	@@ -0,0 +1,292 @@

+#!/usr/bin/env python3
+"""
+Test script for Nebius Embeddings API via HuggingFace Router
+Tests direct API calls to verify authentication and functionality
+"""
+import os
+import sys
+import requests
+from pathlib import Path
+from dotenv import load_dotenv
+try:
+	from huggingface_hub import InferenceClient
+	HF_HUB_AVAILABLE = True
+except ImportError:
+	HF_HUB_AVAILABLE = False
+	print("WARNING: huggingface_hub not available. InferenceClient test will be skipped.")
+# Load .env from project root
+project_root = Path(__file__).resolve().parent
+load_dotenv(project_root / ".env")
+API_URL = "https://router.huggingface.co/nebius/v1/embeddings"
+MODEL = os.getenv("HF_EMBEDDING_MODEL", "Qwen/Qwen3-Embedding-8B")
+def get_headers():
+	"""Get authorization headers"""
+	hf_token = os.getenv("HF_TOKEN")
+	if not hf_token:
+		print("ERROR: HF_TOKEN environment variable is not set!")
+		print("Please set HF_TOKEN in your .env file or environment variables.")
+		sys.exit(1)
+	return {
+		"Authorization": f"Bearer {hf_token}",
+		"Content-Type": "application/json"
+	}
+def query(payload):
+	"""Make API request to Nebius embeddings endpoint"""
+	headers = get_headers()
+	try:
+		response = requests.post(API_URL, headers=headers, json=payload, timeout=60.0)
+		return response
+	except requests.exceptions.RequestException as e:
+		print(f"ERROR: Request failed: {e}")
+		return None
+def test_single_text():
+	"""Test embedding a single text"""
+	print("\n" + "="*60)
+	print("TEST 1: Single Text Embedding")
+	print("="*60)
+	test_text = "ما هي المادة المتعلقة بالنفقة في نظام الأحوال الشخصية؟"
+	print(f"Input text: {test_text}")
+	print(f"Model: {MODEL}")
+	payload = {
+		"model": MODEL,
+		"input": test_text
+	}
+	response = query(payload)
+	if response is None:
+		return False
+	print(f"\nStatus Code: {response.status_code}")
+	if response.status_code == 200:
+		data = response.json()
+		print(f"Response keys: {list(data.keys())}")
+		if "data" in data and len(data["data"]) > 0:
+			embedding = data["data"][0]["embedding"]
+			print(f"Embedding dimensions: {len(embedding)}")
+			print(f"First 10 values: {embedding[:10]}")
+			print(f"Last 10 values: {embedding[-10:]}")
+			print("✓ Single text embedding successful!")
+			return True
+		else:
+			print(f"Unexpected response format: {data}")
+			return False
+	else:
+		print(f"ERROR: Request failed with status {response.status_code}")
+		print(f"Response: {response.text}")
+		if response.status_code == 401:
+			print("\nAuthentication failed. Please check:")
+			print("1. HF_TOKEN is correct and valid")
+			print("2. Token has proper permissions for Nebius provider")
+			print("3. Token is not expired")
+		return False
+def test_batch_texts():
+	"""Test embedding multiple texts"""
+	print("\n" + "="*60)
+	print("TEST 2: Batch Text Embedding")
+	print("="*60)
+	test_texts = [
+		"ما هي المادة المتعلقة بالنفقة؟",
+		"ما هي شروط الزواج؟",
+		"كيف يتم الطلاق؟"
+	]
+	print(f"Input texts ({len(test_texts)}):")
+	for i, text in enumerate(test_texts, 1):
+		print(f"  {i}. {text}")
+	print(f"Model: {MODEL}")
+	payload = {
+		"model": MODEL,
+		"input": test_texts
+	}
+	response = query(payload)
+	if response is None:
+		return False
+	print(f"\nStatus Code: {response.status_code}")
+	if response.status_code == 200:
+		data = response.json()
+		print(f"Response keys: {list(data.keys())}")
+		if "data" in data:
+			print(f"Number of embeddings returned: {len(data['data'])}")
+			for i, item in enumerate(data["data"]):
+				embedding = item["embedding"]
+				print(f"  Embedding {i+1}: {len(embedding)} dimensions")
+			print("✓ Batch text embedding successful!")
+			return True
+		else:
+			print(f"Unexpected response format: {data}")
+			return False
+	else:
+		print(f"ERROR: Request failed with status {response.status_code}")
+		print(f"Response: {response.text}")
+		return False
+def test_huggingface_hub_client():
+	"""Test using HuggingFace Hub InferenceClient (same approach as HuggingFaceEmbeddingsWrapper)"""
+	print("\n" + "="*60)
+	print("TEST 3: HuggingFace Hub InferenceClient")
+	print("="*60)
+	if not HF_HUB_AVAILABLE:
+		print("SKIPPED: huggingface_hub package not installed")
+		return None
+	hf_token = os.getenv("HF_TOKEN")
+	if not hf_token:
+		print("ERROR: HF_TOKEN not set")
+		return False
+	test_text = "ما هي المادة المتعلقة بالنفقة في نظام الأحوال الشخصية؟"
+	print(f"Input text: {test_text}")
+	print(f"Model: {MODEL}")
+	print(f"Provider: nebius")
+	try:
+		# Initialize client (same as HuggingFaceEmbeddingsWrapper)
+		client = InferenceClient(
+			provider="nebius",
+			api_key=hf_token
+		)
+		print("✓ InferenceClient initialized successfully")
+		# Test feature_extraction (same as HuggingFaceEmbeddingsWrapper)
+		print("Calling client.feature_extraction()...")
+		result = client.feature_extraction(
+			test_text,
+			model=MODEL
+		)
+		# Check result format - InferenceClient returns numpy.ndarray
+		import numpy as np
+		# Convert numpy array to list if needed
+		if isinstance(result, np.ndarray):
+			# Handle 2D array (batch) or 1D array (single)
+			if result.ndim == 2:
+				# Batch result - convert to list of lists
+				result = result.tolist()
+			else:
+				# Single result - convert to list
+				result = result.tolist()
+		if isinstance(result, list):
+			# Handle nested list (batch) or flat list (single)
+			if len(result) > 0 and isinstance(result[0], list):
+				# Batch result
+				print(f"✓ Feature extraction successful! (batch format)")
+				print(f"Number of embeddings: {len(result)}")
+				for i, emb in enumerate(result):
+					print(f"  Embedding {i+1}: {len(emb)} dimensions")
+			else:
+				# Single result
+				print(f"✓ Feature extraction successful!")
+				print(f"Embedding dimensions: {len(result)}")
+				print(f"First 10 values: {result[:10]}")
+				print(f"Last 10 values: {result[-10:]}")
+			# Test batch processing
+			print("\nTesting batch processing...")
+			test_texts = [
+				"ما هي المادة المتعلقة بالنفقة؟",
+				"ما هي شروط الزواج؟"
+			]
+			results = []
+			for text in test_texts:
+				embedding = client.feature_extraction(text, model=MODEL)
+				# Convert numpy array to list if needed
+				if isinstance(embedding, np.ndarray):
+					if embedding.ndim == 2:
+						embedding = embedding.tolist()[0]  # Extract first row if 2D
+					else:
+						embedding = embedding.tolist()
+				results.append(embedding)
+			print(f"✓ Batch processing successful! Processed {len(results)} texts")
+			print(f"  Embedding 1: {len(results[0])} dimensions")
+			print(f"  Embedding 2: {len(results[1])} dimensions")
+			return True
+		else:
+			print(f"Unexpected result format: {type(result)}")
+			print(f"Result: {result}")
+			return False
+	except Exception as e:
+		print(f"ERROR: InferenceClient test failed")
+		print(f"Error type: {type(e).__name__}")
+		print(f"Error message: {str(e)}")
+		# Provide helpful error messages
+		if "401" in str(e) or "Unauthorized" in str(e):
+			print("\nAuthentication failed. Please check:")
+			print("1. HF_TOKEN is correct and valid")
+			print("2. Token has proper permissions for Nebius provider")
+			print("3. Token is not expired")
+		elif "404" in str(e) or "Not Found" in str(e):
+			print("\nModel or endpoint not found. Please check:")
+			print(f"1. Model '{MODEL}' is available on Nebius")
+			print("2. Provider 'nebius' is correctly configured")
+		return False
+def main():
+	"""Run all tests"""
+	print("Nebius Embeddings API Test")
+	print("="*60)
+	print(f"API URL: {API_URL}")
+	print(f"Model: {MODEL}")
+	print(f"HF_TOKEN: {'*' * 20 if os.getenv('HF_TOKEN') else 'NOT SET'}")
+	# Check if token is set
+	if not os.getenv("HF_TOKEN"):
+		print("\nERROR: HF_TOKEN not found!")
+		print("Please set it in your .env file:")
+		print("  HF_TOKEN=your_token_here")
+		sys.exit(1)
+	# Run tests
+	results = []
+	results.append(("Single Text (Direct API)", test_single_text()))
+	results.append(("Batch Texts (Direct API)", test_batch_texts()))
+	# Test HuggingFace Hub InferenceClient if available
+	if HF_HUB_AVAILABLE:
+		hf_result = test_huggingface_hub_client()
+		if hf_result is not None:
+			results.append(("HuggingFace Hub InferenceClient", hf_result))
+	# Summary
+	print("\n" + "="*60)
+	print("TEST SUMMARY")
+	print("="*60)
+	for test_name, success in results:
+		status = "✓ PASSED" if success else "✗ FAILED"
+		print(f"{test_name}: {status}")
+	all_passed = all(result[1] for result in results)
+	if all_passed:
+		print("\n✓ All tests passed! API is working correctly.")
+		sys.exit(0)
+	else:
+		print("\n✗ Some tests failed. Check the errors above.")
+		sys.exit(1)
+if __name__ == "__main__":
+	main()