Spaces:

AldawsariNLP
/

Saudi-Law-AI-Assistant

Sleeping

App Files Files Community

AldawsariNLP commited on 12 days ago

Commit

43ad92f

1 Parent(s): 5525601

pushing last changes - dockerignore ...2

Browse files

Files changed (3) hide show

GITHUB_SETUP.md +0 -145
QUICKSTART.md +0 -324
README.md +0 -133

GITHUB_SETUP.md DELETED Viewed

@@ -1,145 +0,0 @@
-# GitHub Setup Guide
-This guide will help you set up GitHub synchronization for your project.
-## Prerequisites
-### 1. Install Git
-Git is not currently installed on your system. Please install it:
-**Windows:**
-1. Download Git from: https://git-scm.com/download/win
-2. Run the installer and follow the setup wizard
-3. Choose "Git from the command line and also from 3rd-party software" when prompted
-4. Restart your terminal/PowerShell after installation
-**Verify Installation:**
-```powershell
-git --version
-```
-### 2. Configure Git (First Time Only)
-After installing Git, configure your name and email:
-```powershell
-git config --global user.name "Your Name"
-git config --global user.email "[email protected]"
-```
-## Setting Up GitHub Repository
-### Step 1: Create Repository on GitHub
-1. Go to https://github.com and sign in
-2. Click the "+" icon in the top right, then "New repository"
-3. Fill in:
-   - **Repository name**: `law-document-rag` (or your preferred name)
-   - **Description**: "Law Document RAG Chat Application"
-   - **Visibility**: Public or Private
-   - **DO NOT** initialize with README, .gitignore, or license (we already have these)
-4. Click "Create repository"
-### Step 2: Initialize Git Repository
-Open PowerShell in your project directory and run:
-```powershell
-cd "C:\Users\Dr. Mohammed Alrobia\Desktop\Python_Projects\law_project1"
-git init
-```
-### Step 3: Stage and Commit Files
-```powershell
-# Stage all files
-git add .
-# Create initial commit
-git commit -m "Initial commit: Cleaned up project for HuggingFace Spaces deployment"
-```
-### Step 4: Add GitHub Remote
-Replace `YOUR_USERNAME` and `YOUR_REPO_NAME` with your actual GitHub username and repository name:
-```powershell
-git remote add origin https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
-```
-### Step 5: Push to GitHub
-```powershell
-# Push to main branch (or master if your default is master)
-git branch -M main
-git push -u origin main
-```
-If you're using master branch:
-```powershell
-git push -u origin master
-```
-## Future Workflow
-### Pushing Changes
-After making changes to your project:
-```powershell
-# Stage changes
-git add .
-# Commit with descriptive message
-git commit -m "Description of your changes"
-# Push to GitHub
-git push
-```
-### Pulling Changes
-To get the latest changes from GitHub:
-```powershell
-git pull
-```
-### Checking Status
-To see what files have changed:
-```powershell
-git status
-```
-## Important Notes
-- **Never commit `.env` file** - It contains your API keys and is already in `.gitignore`
-- **`vectorstore/` and `processed_documents.json`** are included in the repository (as requested)
-- **`uv.lock`** is included for reproducible builds
-- If you get authentication errors, you may need to set up a Personal Access Token:
-  - Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
-  - Generate a new token with `repo` permissions
-  - Use the token as your password when pushing
-## Troubleshooting
-### Authentication Issues
-If you get authentication errors, you can use a Personal Access Token:
-1. Create a token at: https://github.com/settings/tokens
-2. When prompted for password, use the token instead
-### Large Files
-If you encounter issues with large files (like PDFs in documents/), you may need Git LFS:
-```powershell
-git lfs install
-git lfs track "*.pdf"
-git add .gitattributes
-git add .
-git commit -m "Add Git LFS for PDF files"
-```

QUICKSTART.md DELETED Viewed

@@ -1,324 +0,0 @@
-# Quick Start Guide
-Complete guide for local development and deployment to Hugging Face Spaces.
-## Prerequisites
-- Python 3.10 or 3.11 (required for faiss-cpu compatibility)
-- uv (fast Python package manager) - [Install uv](https://github.com/astral-sh/uv)
-- Node.js 16+ and npm
-- OpenAI API key
-- Git installed (for deployment)
-- Hugging Face account (for deployment) - [Sign up](https://huggingface.co)
----
-## Part 1: Local Development
-### 1. Install uv (if not already installed)
-**macOS/Linux:**
-```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-**Windows:**
-```powershell
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-```
-### 2. Install Node.js (REQUIRED - if not already installed)
-**Check if Node.js is installed:**
-```bash
-node --version
-npm --version
-```
-**If you get "node is not recognized" error:**
-1. **Download Node.js**:
-   - Visit: https://nodejs.org/
-   - Click the **green "LTS" button** (Long Term Support)
-   - Download and run the installer
-2. **During Installation**:
-   - Make sure "Add to PATH" is checked (usually automatic)
-   - Complete the installation
-3. **CRITICAL: Restart Your Terminal**:
-   - **Close your terminal completely**
-   - **Open a new terminal window**
-   - This is required for PATH changes to take effect
-4. **Verify Installation**:
-   ```bash
-   node --version
-   npm --version
-   ```
-   Both should show version numbers.
-### 3. Install Dependencies
-**Backend (using uv):**
-```bash
-uv sync
-```
-**Frontend:**
-```bash
-cd frontend
-npm install
-cd ..
-```
-### 4. Configure API Key
-Create `.env` in the project root:
-```
-OPENAI_API_KEY=sk-your-actual-api-key-here
-```
-### 5. Add Documents
-Copy your PDF/TXT/DOC/DOCX files into the `documents/` folder. The application will automatically process them when you start the backend.
-### 6. Run the Application
-**Terminal 1 - Backend:**
-```bash
-# Using uv run (recommended)
-uv run python backend/main.py
-# Or activate the virtual environment
-# macOS/Linux: source .venv/bin/activate && python backend/main.py
-# Windows: .venv\Scripts\activate && python backend\main.py
-```
-The API will run on `http://localhost:8000`
-**Terminal 2 - Frontend:**
-```bash
-cd frontend
-npm start
-```
-The app will open at `http://localhost:3000`
-### 7. Use the Application
-1. Open http://localhost:3000 in your browser
-2. The system will automatically detect and process documents from the `documents/` folder
-3. Ask questions about your documents!
-### Example Questions
-- "What are the key provisions in the contract?"
-- "What does the law say about [topic]?"
-- "Summarize the main points of the document"
----
-## Part 2: Deployment to Hugging Face Spaces
-### 1. Create a New Space
-1. Go to https://huggingface.co/spaces
-2. Click "Create new Space"
-3. Fill in the details:
-   - **Space name**: `saudi-law-ai-assistant` (or your preferred name)
-   - **SDK**: Select **Docker**
-   - **Visibility**: Public or Private
-4. Click "Create Space"
-### 2. Prepare Your Code
-1. **Build the React frontend**:
-   ```bash
-   cd frontend
-   npm install
-   npm run build
-   cd ..
-   ```
-2. **Ensure all files are ready**:
-   - `app.py` - Main entry point
-   - `pyproject.toml` and `uv.lock` - Python dependencies
-   - `Dockerfile` - Docker configuration
-   - `backend/` - Backend code
-   - `frontend/build/` - Built React app (always run `npm run build` before pushing)
-   - `processed_documents.json` - Optional bundled data so the Space can answer immediately (make sure it is **not** ignored in `.dockerignore`)
-   - `vectorstore/` - Optional pre-built vectorstore folder (if it exists in your repo, it will be included in the Docker image)
-   - `documents/` — PDF sources that power preview/download. Because Hugging Face blocks large binaries in standard git pushes, you have two options:
-     - Use [HF Xet storage](https://huggingface.co/docs/hub/xet/using-xet-storage#git) for the `documents/` folder so it can live in the repo.
-     - Or keep the folder locally, and after every push upload the PDFs through the Space UI (**Files and versions → Upload files**) into `documents/`.
-### 3. Set Up Environment Variables
-1. In your Hugging Face Space, go to **Settings**
-2. Scroll to **Repository secrets**
-3. Add secrets:
-   - **Name**: `OPENAI_API_KEY`
-   - **Value**: Your OpenAI API key
-   - (Optional) **Name**: `HF_TOKEN` (if you need to upload files programmatically)
-### 4. Set Up Xet Storage (Recommended for PDFs)
-If you want to store PDFs in the repository:
-1. **Enable Xet storage** on your Space:
-   - Go to Space Settings → Large file storage
-   - Enable "Hugging Face Xet" (or request access at https://huggingface.co/join/xet)
-2. **Install git-xet locally**:
-   ```bash
-   # macOS/Linux
-   curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
-   # Or via Homebrew
-   brew tap huggingface/tap
-   brew install git-xet
-   git xet install
-   ```
-3. **Configure git to use Xet**:
-   ```bash
-   git lfs install
-   git lfs track "documents/*.pdf"
-   git add .gitattributes documents/*.pdf
-   git commit -m "Track PDFs with Xet"
-   ```
-### 5. Push to Hugging Face
-1. **Initialize git** (if not already done):
-   ```bash
-   git init
-   ```
-2. **Add Hugging Face remote**:
-   ```bash
-   git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
-   ```
-   Replace `YOUR_USERNAME` and `YOUR_SPACE_NAME` with your actual values.
-3. **Add and commit files**:
-   ```bash
-   git add .
-   git commit -m "Initial deployment"
-   ```
-4. **Push to Hugging Face**:
-   ```bash
-   git push hf main
-   ```
-### 6. Wait for Build
-- Hugging Face will automatically build your Docker image
-- This may take 5-10 minutes
-- You can monitor the build logs in the Space's "Logs" tab
-### 7. Access Your Application
-Once the build completes, your application will be available at:
-```
-https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space
-```
-### 8. Upload Documents / Processed Data (if not using Xet)
-If you didn't use Xet storage for PDFs:
-- After the Space builds, open the **Files and versions** tab and click **Upload files** to add your `documents/*.pdf`
-- If you have a prebuilt `processed_documents.json`, upload it as well so the backend can build the vectorstore immediately
-- The startup logs will print whether `processed_documents.json` and `documents/` were detected inside the container
-### 9. Redeploy Checklist
-When updating your Space:
-1. `cd frontend && npm install && npm run build && cd ..`
-2. `git add .`
-3. `git commit -m "Update application"`
-4. `git push hf main` (or `git push hf main --force` if needed)
-5. Watch the Space build logs and confirm the new startup logs show the presence of `processed_documents.json`/`documents`
----
-## Important Notes
-1. **API Endpoints**: The frontend is configured to use `/api` prefix for backend calls. This is handled by the `app.py` file.
-2. **Documents Folder**: The `documents/` folder is automatically created if it doesn't exist. To bundle PDFs, either:
-   - Enable HF Xet storage for `documents/` (recommended)
-   - Or upload the files via the Space UI after each push
-3. **Processed Data**: `processed_documents.json` can be bundled with the repo. The backend tries to bootstrap from this file at startup, so make sure it reflects the content you expect the Space to serve.
-4. **Vectorstore**: The `vectorstore/` folder is included in the Docker image if it exists in your repo. If it doesn't exist, it will be created at runtime from `processed_documents.json`.
-5. **Port**: Hugging Face Spaces uses port 7860 by default, which is configured in `app.py`.
-6. **Dependencies**: This project uses `uv` for Python package management. Dependencies are defined in `pyproject.toml` and `uv.lock`.
----
-## Troubleshooting
-### Local Development
-**"OpenAI API key is required"**
-- Make sure you created `.env` in the project root with your API key
-**"No documents found"**
-- Check that files are in the `documents/` folder
-- Supported formats: PDF, TXT, DOCX, DOC
-**Frontend can't connect to backend**
-- Ensure backend is running on port 8000
-- Check that CORS is enabled (it is by default)
-**"npm is not recognized" or "node is not recognized"**
-- Node.js is not installed or not in your PATH
-- Install Node.js from https://nodejs.org/
-- Restart your terminal after installation
-- Verify installation: `node --version` and `npm --version`
-### Hugging Face Spaces Deployment
-**Build Fails**
-- Check the build logs in the Space's "Logs" tab
-- Ensure all dependencies are in `pyproject.toml`
-- Verify the Dockerfile is correct
-- Make sure `frontend/build/` exists (run `npm run build`)
-**"RAG system not initialized" (on Spaces)**
-- Ensure `processed_documents.json` is present in the repo **and** not excluded by `.dockerignore`
-- Upload your source PDFs (or processed data) in the Space UI, then restart the Space
-- Check startup logs for initialization messages
-**API Errors**
-- Check that `OPENAI_API_KEY` is set correctly in Space secrets
-- Verify the API key is valid and has credits
-- Check the Space logs for detailed error messages
-**Frontend Not Loading**
-- Ensure `npm run build` was run successfully before pushing
-- Check that `frontend/build/` directory exists and contains `index.html`
-- Verify the build completed without errors
-**Document Preview Not Working**
-- Ensure PDFs are uploaded to the `documents/` folder in the Space
-- Check that filenames match exactly (including encoding)
-- Verify documents are accessible via the Space's file browser
-**Push Rejected - Binary Files**
-- Enable Xet storage for your Space (see Step 4 above)
-- Or exclude PDFs from git and upload via Space UI
----
-## Next Steps
-- See [README.md](README.md) for full documentation and API details
-- Check the Space logs for detailed startup and error information
-- Monitor your OpenAI API usage to avoid unexpected charges

README.md DELETED Viewed

@@ -1,133 +0,0 @@
----
-title: Saudi Law AI Assistant
-emoji: ⚖️
-colorFrom: blue
-colorTo: purple
-sdk: docker
-pinned: false
----
-# Law Document RAG Chat Application
-...
-# Law Document RAG Chat Application
-A web application that allows users to ask questions about indexed legal documents using Retrieval Augmented Generation (RAG) techniques.
-## Features
-- 🤖 **RAG-powered Q&A**: Ask questions about your legal documents and get answers extracted directly from the context
-- 📚 **Document Indexing**: Automatically index PDF, TXT, DOCX, and DOC files from a folder
-- 🎨 **Modern React Frontend**: Beautiful, responsive chat interface
-- ⚡ **FastAPI Backend**: High-performance API with LangChain and FAISS
-- 🔍 **Exact Context Extraction**: Answers are extracted directly from documents, not generated
-- 🚀 **Hugging Face Spaces Ready**: Configured for easy deployment
-## Tech Stack
-- **Frontend**: React 18
-- **Backend**: FastAPI
-- **RAG**: LangChain + FAISS + OpenAI Embeddings
-- **Vector Database**: FAISS
-- **LLM**: OpenAI API (for embeddings)
-- **Python**: 3.10 or 3.11 (required for faiss-cpu compatibility)
-## Project Structure
-```
-KSAlaw-document-agent/
-├── backend/
-│   ├── main.py              # FastAPI application
-│   ├── rag_system.py        # RAG implementation
-│   ├── document_processor.py # Document processing logic
-│   ├── embeddings.py        # OpenAI embeddings wrapper
-│   └── chat_history.py     # Chat history management
-├── frontend/
-│   ├── src/
-│   │   ├── App.js           # Main React component
-│   │   ├── App.css          # Styles
-│   │   ├── index.js         # React entry point
-│   │   └── index.css        # Global styles
-│   ├── build/               # Built React app (for deployment)
-│   ├── public/
-│   │   └── index.html       # HTML template
-│   └── package.json         # Node dependencies
-├── documents/               # Place your PDF documents here
-├── vectorstore/            # FAISS vectorstore (auto-generated)
-├── app.py                   # Hugging Face Spaces entry point
-├── Dockerfile               # Docker configuration
-├── pyproject.toml           # Python dependencies (uv)
-├── uv.lock                  # Locked dependencies
-├── processed_documents.json # Processed document summaries
-├── QUICKSTART.md            # Complete setup and deployment guide
-└── README.md                # This file
-```
-## Quick Start
-For complete setup and deployment instructions, see **[QUICKSTART.md](QUICKSTART.md)**.
-### Quick Overview
-**Local Development:**
-1. Install dependencies: `uv sync` and `cd frontend && npm install`
-2. Create `.env` with your `OPENAI_API_KEY`
-3. Add documents to `documents/` folder
-4. Run backend: `uv run python backend/main.py`
-5. Run frontend: `cd frontend && npm start`
-**Deployment to Hugging Face Spaces:**
-1. Build frontend: `cd frontend && npm run build`
-2. Set up Xet storage (recommended) or prepare to upload PDFs via UI
-3. Push to Hugging Face: `git push hf main`
-4. Set `OPENAI_API_KEY` in Space secrets
-See [QUICKSTART.md](QUICKSTART.md) for detailed step-by-step instructions for both local development and deployment.
-## API Endpoints
-- `GET /api/` - Health check
-- `GET /api/health` - Health status
-- `POST /api/index` - Index documents from a folder
-  ```json
-  {
-    "folder_path": "documents"
-  }
-  ```
-- `POST /api/ask` - Ask a question
-  ```json
-  {
-    "question": "What is the law about X?"
-  }
-  ```
-## Environment Variables
-- `OPENAI_API_KEY`: Your OpenAI API key (required)
-## Notes
-- The system extracts exact text from documents, not generated responses
-- Supported document formats: PDF, TXT, DOCX, DOC
-- The vectorstore is saved locally and persists between sessions
-- Documents are automatically processed on startup (no manual indexing needed)
-- For Hugging Face Spaces, the frontend automatically uses `/api` as the API URL
-- This project uses `uv` for Python package management - dependencies are defined in `pyproject.toml` and `uv.lock`
-- The `.env` file should be in the project root (not in the backend folder)
-- PDFs can be stored using Hugging Face Xet storage or uploaded via the Space UI
-## Troubleshooting
-For detailed troubleshooting, see the [Troubleshooting section in QUICKSTART.md](QUICKSTART.md#troubleshooting).
-### Common Issues
-- **OpenAI API Key Error**: Make sure `OPENAI_API_KEY` is set in your `.env` file (local) or Space secrets (deployment)
-- **No documents found**: Ensure documents are in the `documents/` folder with supported extensions (PDF, TXT, DOCX, DOC)
-- **Frontend can't connect**: Check that the backend is running on port 8000
-- **Build fails on Spaces**: Ensure `frontend/build/` exists (run `npm run build`), check Dockerfile, verify dependencies in `pyproject.toml`
-- **RAG system not initialized**: Check Space logs, ensure `processed_documents.json` exists and is not ignored by `.dockerignore`
-## License
-MIT