Spaces:
Sleeping
Sleeping
Commit
·
43ad92f
1
Parent(s):
5525601
pushing last changes - dockerignore ...2
Browse files- GITHUB_SETUP.md +0 -145
- QUICKSTART.md +0 -324
- README.md +0 -133
GITHUB_SETUP.md
DELETED
|
@@ -1,145 +0,0 @@
|
|
| 1 |
-
# GitHub Setup Guide
|
| 2 |
-
|
| 3 |
-
This guide will help you set up GitHub synchronization for your project.
|
| 4 |
-
|
| 5 |
-
## Prerequisites
|
| 6 |
-
|
| 7 |
-
### 1. Install Git
|
| 8 |
-
|
| 9 |
-
Git is not currently installed on your system. Please install it:
|
| 10 |
-
|
| 11 |
-
**Windows:**
|
| 12 |
-
1. Download Git from: https://git-scm.com/download/win
|
| 13 |
-
2. Run the installer and follow the setup wizard
|
| 14 |
-
3. Choose "Git from the command line and also from 3rd-party software" when prompted
|
| 15 |
-
4. Restart your terminal/PowerShell after installation
|
| 16 |
-
|
| 17 |
-
**Verify Installation:**
|
| 18 |
-
```powershell
|
| 19 |
-
git --version
|
| 20 |
-
```
|
| 21 |
-
|
| 22 |
-
### 2. Configure Git (First Time Only)
|
| 23 |
-
|
| 24 |
-
After installing Git, configure your name and email:
|
| 25 |
-
|
| 26 |
-
```powershell
|
| 27 |
-
git config --global user.name "Your Name"
|
| 28 |
-
git config --global user.email "[email protected]"
|
| 29 |
-
```
|
| 30 |
-
|
| 31 |
-
## Setting Up GitHub Repository
|
| 32 |
-
|
| 33 |
-
### Step 1: Create Repository on GitHub
|
| 34 |
-
|
| 35 |
-
1. Go to https://github.com and sign in
|
| 36 |
-
2. Click the "+" icon in the top right, then "New repository"
|
| 37 |
-
3. Fill in:
|
| 38 |
-
- **Repository name**: `law-document-rag` (or your preferred name)
|
| 39 |
-
- **Description**: "Law Document RAG Chat Application"
|
| 40 |
-
- **Visibility**: Public or Private
|
| 41 |
-
- **DO NOT** initialize with README, .gitignore, or license (we already have these)
|
| 42 |
-
4. Click "Create repository"
|
| 43 |
-
|
| 44 |
-
### Step 2: Initialize Git Repository
|
| 45 |
-
|
| 46 |
-
Open PowerShell in your project directory and run:
|
| 47 |
-
|
| 48 |
-
```powershell
|
| 49 |
-
cd "C:\Users\Dr. Mohammed Alrobia\Desktop\Python_Projects\law_project1"
|
| 50 |
-
git init
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
### Step 3: Stage and Commit Files
|
| 54 |
-
|
| 55 |
-
```powershell
|
| 56 |
-
# Stage all files
|
| 57 |
-
git add .
|
| 58 |
-
|
| 59 |
-
# Create initial commit
|
| 60 |
-
git commit -m "Initial commit: Cleaned up project for HuggingFace Spaces deployment"
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
### Step 4: Add GitHub Remote
|
| 64 |
-
|
| 65 |
-
Replace `YOUR_USERNAME` and `YOUR_REPO_NAME` with your actual GitHub username and repository name:
|
| 66 |
-
|
| 67 |
-
```powershell
|
| 68 |
-
git remote add origin https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
### Step 5: Push to GitHub
|
| 72 |
-
|
| 73 |
-
```powershell
|
| 74 |
-
# Push to main branch (or master if your default is master)
|
| 75 |
-
git branch -M main
|
| 76 |
-
git push -u origin main
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
If you're using master branch:
|
| 80 |
-
```powershell
|
| 81 |
-
git push -u origin master
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
## Future Workflow
|
| 85 |
-
|
| 86 |
-
### Pushing Changes
|
| 87 |
-
|
| 88 |
-
After making changes to your project:
|
| 89 |
-
|
| 90 |
-
```powershell
|
| 91 |
-
# Stage changes
|
| 92 |
-
git add .
|
| 93 |
-
|
| 94 |
-
# Commit with descriptive message
|
| 95 |
-
git commit -m "Description of your changes"
|
| 96 |
-
|
| 97 |
-
# Push to GitHub
|
| 98 |
-
git push
|
| 99 |
-
```
|
| 100 |
-
|
| 101 |
-
### Pulling Changes
|
| 102 |
-
|
| 103 |
-
To get the latest changes from GitHub:
|
| 104 |
-
|
| 105 |
-
```powershell
|
| 106 |
-
git pull
|
| 107 |
-
```
|
| 108 |
-
|
| 109 |
-
### Checking Status
|
| 110 |
-
|
| 111 |
-
To see what files have changed:
|
| 112 |
-
|
| 113 |
-
```powershell
|
| 114 |
-
git status
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
## Important Notes
|
| 118 |
-
|
| 119 |
-
- **Never commit `.env` file** - It contains your API keys and is already in `.gitignore`
|
| 120 |
-
- **`vectorstore/` and `processed_documents.json`** are included in the repository (as requested)
|
| 121 |
-
- **`uv.lock`** is included for reproducible builds
|
| 122 |
-
- If you get authentication errors, you may need to set up a Personal Access Token:
|
| 123 |
-
- Go to GitHub Settings → Developer settings → Personal access tokens → Tokens (classic)
|
| 124 |
-
- Generate a new token with `repo` permissions
|
| 125 |
-
- Use the token as your password when pushing
|
| 126 |
-
|
| 127 |
-
## Troubleshooting
|
| 128 |
-
|
| 129 |
-
### Authentication Issues
|
| 130 |
-
|
| 131 |
-
If you get authentication errors, you can use a Personal Access Token:
|
| 132 |
-
1. Create a token at: https://github.com/settings/tokens
|
| 133 |
-
2. When prompted for password, use the token instead
|
| 134 |
-
|
| 135 |
-
### Large Files
|
| 136 |
-
|
| 137 |
-
If you encounter issues with large files (like PDFs in documents/), you may need Git LFS:
|
| 138 |
-
```powershell
|
| 139 |
-
git lfs install
|
| 140 |
-
git lfs track "*.pdf"
|
| 141 |
-
git add .gitattributes
|
| 142 |
-
git add .
|
| 143 |
-
git commit -m "Add Git LFS for PDF files"
|
| 144 |
-
```
|
| 145 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
QUICKSTART.md
DELETED
|
@@ -1,324 +0,0 @@
|
|
| 1 |
-
# Quick Start Guide
|
| 2 |
-
|
| 3 |
-
Complete guide for local development and deployment to Hugging Face Spaces.
|
| 4 |
-
|
| 5 |
-
## Prerequisites
|
| 6 |
-
|
| 7 |
-
- Python 3.10 or 3.11 (required for faiss-cpu compatibility)
|
| 8 |
-
- uv (fast Python package manager) - [Install uv](https://github.com/astral-sh/uv)
|
| 9 |
-
- Node.js 16+ and npm
|
| 10 |
-
- OpenAI API key
|
| 11 |
-
- Git installed (for deployment)
|
| 12 |
-
- Hugging Face account (for deployment) - [Sign up](https://huggingface.co)
|
| 13 |
-
|
| 14 |
-
---
|
| 15 |
-
|
| 16 |
-
## Part 1: Local Development
|
| 17 |
-
|
| 18 |
-
### 1. Install uv (if not already installed)
|
| 19 |
-
|
| 20 |
-
**macOS/Linux:**
|
| 21 |
-
```bash
|
| 22 |
-
curl -LsSf https://astral.sh/uv/install.sh | sh
|
| 23 |
-
```
|
| 24 |
-
|
| 25 |
-
**Windows:**
|
| 26 |
-
```powershell
|
| 27 |
-
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
|
| 28 |
-
```
|
| 29 |
-
|
| 30 |
-
### 2. Install Node.js (REQUIRED - if not already installed)
|
| 31 |
-
|
| 32 |
-
**Check if Node.js is installed:**
|
| 33 |
-
```bash
|
| 34 |
-
node --version
|
| 35 |
-
npm --version
|
| 36 |
-
```
|
| 37 |
-
|
| 38 |
-
**If you get "node is not recognized" error:**
|
| 39 |
-
|
| 40 |
-
1. **Download Node.js**:
|
| 41 |
-
- Visit: https://nodejs.org/
|
| 42 |
-
- Click the **green "LTS" button** (Long Term Support)
|
| 43 |
-
- Download and run the installer
|
| 44 |
-
|
| 45 |
-
2. **During Installation**:
|
| 46 |
-
- Make sure "Add to PATH" is checked (usually automatic)
|
| 47 |
-
- Complete the installation
|
| 48 |
-
|
| 49 |
-
3. **CRITICAL: Restart Your Terminal**:
|
| 50 |
-
- **Close your terminal completely**
|
| 51 |
-
- **Open a new terminal window**
|
| 52 |
-
- This is required for PATH changes to take effect
|
| 53 |
-
|
| 54 |
-
4. **Verify Installation**:
|
| 55 |
-
```bash
|
| 56 |
-
node --version
|
| 57 |
-
npm --version
|
| 58 |
-
```
|
| 59 |
-
Both should show version numbers.
|
| 60 |
-
|
| 61 |
-
### 3. Install Dependencies
|
| 62 |
-
|
| 63 |
-
**Backend (using uv):**
|
| 64 |
-
```bash
|
| 65 |
-
uv sync
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
**Frontend:**
|
| 69 |
-
```bash
|
| 70 |
-
cd frontend
|
| 71 |
-
npm install
|
| 72 |
-
cd ..
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
### 4. Configure API Key
|
| 76 |
-
|
| 77 |
-
Create `.env` in the project root:
|
| 78 |
-
```
|
| 79 |
-
OPENAI_API_KEY=sk-your-actual-api-key-here
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
-
### 5. Add Documents
|
| 83 |
-
|
| 84 |
-
Copy your PDF/TXT/DOC/DOCX files into the `documents/` folder. The application will automatically process them when you start the backend.
|
| 85 |
-
|
| 86 |
-
### 6. Run the Application
|
| 87 |
-
|
| 88 |
-
**Terminal 1 - Backend:**
|
| 89 |
-
```bash
|
| 90 |
-
# Using uv run (recommended)
|
| 91 |
-
uv run python backend/main.py
|
| 92 |
-
|
| 93 |
-
# Or activate the virtual environment
|
| 94 |
-
# macOS/Linux: source .venv/bin/activate && python backend/main.py
|
| 95 |
-
# Windows: .venv\Scripts\activate && python backend\main.py
|
| 96 |
-
```
|
| 97 |
-
The API will run on `http://localhost:8000`
|
| 98 |
-
|
| 99 |
-
**Terminal 2 - Frontend:**
|
| 100 |
-
```bash
|
| 101 |
-
cd frontend
|
| 102 |
-
npm start
|
| 103 |
-
```
|
| 104 |
-
The app will open at `http://localhost:3000`
|
| 105 |
-
|
| 106 |
-
### 7. Use the Application
|
| 107 |
-
|
| 108 |
-
1. Open http://localhost:3000 in your browser
|
| 109 |
-
2. The system will automatically detect and process documents from the `documents/` folder
|
| 110 |
-
3. Ask questions about your documents!
|
| 111 |
-
|
| 112 |
-
### Example Questions
|
| 113 |
-
|
| 114 |
-
- "What are the key provisions in the contract?"
|
| 115 |
-
- "What does the law say about [topic]?"
|
| 116 |
-
- "Summarize the main points of the document"
|
| 117 |
-
|
| 118 |
-
---
|
| 119 |
-
|
| 120 |
-
## Part 2: Deployment to Hugging Face Spaces
|
| 121 |
-
|
| 122 |
-
### 1. Create a New Space
|
| 123 |
-
|
| 124 |
-
1. Go to https://huggingface.co/spaces
|
| 125 |
-
2. Click "Create new Space"
|
| 126 |
-
3. Fill in the details:
|
| 127 |
-
- **Space name**: `saudi-law-ai-assistant` (or your preferred name)
|
| 128 |
-
- **SDK**: Select **Docker**
|
| 129 |
-
- **Visibility**: Public or Private
|
| 130 |
-
4. Click "Create Space"
|
| 131 |
-
|
| 132 |
-
### 2. Prepare Your Code
|
| 133 |
-
|
| 134 |
-
1. **Build the React frontend**:
|
| 135 |
-
```bash
|
| 136 |
-
cd frontend
|
| 137 |
-
npm install
|
| 138 |
-
npm run build
|
| 139 |
-
cd ..
|
| 140 |
-
```
|
| 141 |
-
|
| 142 |
-
2. **Ensure all files are ready**:
|
| 143 |
-
- `app.py` - Main entry point
|
| 144 |
-
- `pyproject.toml` and `uv.lock` - Python dependencies
|
| 145 |
-
- `Dockerfile` - Docker configuration
|
| 146 |
-
- `backend/` - Backend code
|
| 147 |
-
- `frontend/build/` - Built React app (always run `npm run build` before pushing)
|
| 148 |
-
- `processed_documents.json` - Optional bundled data so the Space can answer immediately (make sure it is **not** ignored in `.dockerignore`)
|
| 149 |
-
- `vectorstore/` - Optional pre-built vectorstore folder (if it exists in your repo, it will be included in the Docker image)
|
| 150 |
-
- `documents/` — PDF sources that power preview/download. Because Hugging Face blocks large binaries in standard git pushes, you have two options:
|
| 151 |
-
- Use [HF Xet storage](https://huggingface.co/docs/hub/xet/using-xet-storage#git) for the `documents/` folder so it can live in the repo.
|
| 152 |
-
- Or keep the folder locally, and after every push upload the PDFs through the Space UI (**Files and versions → Upload files**) into `documents/`.
|
| 153 |
-
|
| 154 |
-
### 3. Set Up Environment Variables
|
| 155 |
-
|
| 156 |
-
1. In your Hugging Face Space, go to **Settings**
|
| 157 |
-
2. Scroll to **Repository secrets**
|
| 158 |
-
3. Add secrets:
|
| 159 |
-
- **Name**: `OPENAI_API_KEY`
|
| 160 |
-
- **Value**: Your OpenAI API key
|
| 161 |
-
- (Optional) **Name**: `HF_TOKEN` (if you need to upload files programmatically)
|
| 162 |
-
|
| 163 |
-
### 4. Set Up Xet Storage (Recommended for PDFs)
|
| 164 |
-
|
| 165 |
-
If you want to store PDFs in the repository:
|
| 166 |
-
|
| 167 |
-
1. **Enable Xet storage** on your Space:
|
| 168 |
-
- Go to Space Settings → Large file storage
|
| 169 |
-
- Enable "Hugging Face Xet" (or request access at https://huggingface.co/join/xet)
|
| 170 |
-
|
| 171 |
-
2. **Install git-xet locally**:
|
| 172 |
-
```bash
|
| 173 |
-
# macOS/Linux
|
| 174 |
-
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
|
| 175 |
-
|
| 176 |
-
# Or via Homebrew
|
| 177 |
-
brew tap huggingface/tap
|
| 178 |
-
brew install git-xet
|
| 179 |
-
git xet install
|
| 180 |
-
```
|
| 181 |
-
|
| 182 |
-
3. **Configure git to use Xet**:
|
| 183 |
-
```bash
|
| 184 |
-
git lfs install
|
| 185 |
-
git lfs track "documents/*.pdf"
|
| 186 |
-
git add .gitattributes documents/*.pdf
|
| 187 |
-
git commit -m "Track PDFs with Xet"
|
| 188 |
-
```
|
| 189 |
-
|
| 190 |
-
### 5. Push to Hugging Face
|
| 191 |
-
|
| 192 |
-
1. **Initialize git** (if not already done):
|
| 193 |
-
```bash
|
| 194 |
-
git init
|
| 195 |
-
```
|
| 196 |
-
|
| 197 |
-
2. **Add Hugging Face remote**:
|
| 198 |
-
```bash
|
| 199 |
-
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
|
| 200 |
-
```
|
| 201 |
-
Replace `YOUR_USERNAME` and `YOUR_SPACE_NAME` with your actual values.
|
| 202 |
-
|
| 203 |
-
3. **Add and commit files**:
|
| 204 |
-
```bash
|
| 205 |
-
git add .
|
| 206 |
-
git commit -m "Initial deployment"
|
| 207 |
-
```
|
| 208 |
-
|
| 209 |
-
4. **Push to Hugging Face**:
|
| 210 |
-
```bash
|
| 211 |
-
git push hf main
|
| 212 |
-
```
|
| 213 |
-
|
| 214 |
-
### 6. Wait for Build
|
| 215 |
-
|
| 216 |
-
- Hugging Face will automatically build your Docker image
|
| 217 |
-
- This may take 5-10 minutes
|
| 218 |
-
- You can monitor the build logs in the Space's "Logs" tab
|
| 219 |
-
|
| 220 |
-
### 7. Access Your Application
|
| 221 |
-
|
| 222 |
-
Once the build completes, your application will be available at:
|
| 223 |
-
```
|
| 224 |
-
https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space
|
| 225 |
-
```
|
| 226 |
-
|
| 227 |
-
### 8. Upload Documents / Processed Data (if not using Xet)
|
| 228 |
-
|
| 229 |
-
If you didn't use Xet storage for PDFs:
|
| 230 |
-
|
| 231 |
-
- After the Space builds, open the **Files and versions** tab and click **Upload files** to add your `documents/*.pdf`
|
| 232 |
-
- If you have a prebuilt `processed_documents.json`, upload it as well so the backend can build the vectorstore immediately
|
| 233 |
-
- The startup logs will print whether `processed_documents.json` and `documents/` were detected inside the container
|
| 234 |
-
|
| 235 |
-
### 9. Redeploy Checklist
|
| 236 |
-
|
| 237 |
-
When updating your Space:
|
| 238 |
-
|
| 239 |
-
1. `cd frontend && npm install && npm run build && cd ..`
|
| 240 |
-
2. `git add .`
|
| 241 |
-
3. `git commit -m "Update application"`
|
| 242 |
-
4. `git push hf main` (or `git push hf main --force` if needed)
|
| 243 |
-
5. Watch the Space build logs and confirm the new startup logs show the presence of `processed_documents.json`/`documents`
|
| 244 |
-
|
| 245 |
-
---
|
| 246 |
-
|
| 247 |
-
## Important Notes
|
| 248 |
-
|
| 249 |
-
1. **API Endpoints**: The frontend is configured to use `/api` prefix for backend calls. This is handled by the `app.py` file.
|
| 250 |
-
|
| 251 |
-
2. **Documents Folder**: The `documents/` folder is automatically created if it doesn't exist. To bundle PDFs, either:
|
| 252 |
-
- Enable HF Xet storage for `documents/` (recommended)
|
| 253 |
-
- Or upload the files via the Space UI after each push
|
| 254 |
-
|
| 255 |
-
3. **Processed Data**: `processed_documents.json` can be bundled with the repo. The backend tries to bootstrap from this file at startup, so make sure it reflects the content you expect the Space to serve.
|
| 256 |
-
|
| 257 |
-
4. **Vectorstore**: The `vectorstore/` folder is included in the Docker image if it exists in your repo. If it doesn't exist, it will be created at runtime from `processed_documents.json`.
|
| 258 |
-
|
| 259 |
-
5. **Port**: Hugging Face Spaces uses port 7860 by default, which is configured in `app.py`.
|
| 260 |
-
|
| 261 |
-
6. **Dependencies**: This project uses `uv` for Python package management. Dependencies are defined in `pyproject.toml` and `uv.lock`.
|
| 262 |
-
|
| 263 |
-
---
|
| 264 |
-
|
| 265 |
-
## Troubleshooting
|
| 266 |
-
|
| 267 |
-
### Local Development
|
| 268 |
-
|
| 269 |
-
**"OpenAI API key is required"**
|
| 270 |
-
- Make sure you created `.env` in the project root with your API key
|
| 271 |
-
|
| 272 |
-
**"No documents found"**
|
| 273 |
-
- Check that files are in the `documents/` folder
|
| 274 |
-
- Supported formats: PDF, TXT, DOCX, DOC
|
| 275 |
-
|
| 276 |
-
**Frontend can't connect to backend**
|
| 277 |
-
- Ensure backend is running on port 8000
|
| 278 |
-
- Check that CORS is enabled (it is by default)
|
| 279 |
-
|
| 280 |
-
**"npm is not recognized" or "node is not recognized"**
|
| 281 |
-
- Node.js is not installed or not in your PATH
|
| 282 |
-
- Install Node.js from https://nodejs.org/
|
| 283 |
-
- Restart your terminal after installation
|
| 284 |
-
- Verify installation: `node --version` and `npm --version`
|
| 285 |
-
|
| 286 |
-
### Hugging Face Spaces Deployment
|
| 287 |
-
|
| 288 |
-
**Build Fails**
|
| 289 |
-
- Check the build logs in the Space's "Logs" tab
|
| 290 |
-
- Ensure all dependencies are in `pyproject.toml`
|
| 291 |
-
- Verify the Dockerfile is correct
|
| 292 |
-
- Make sure `frontend/build/` exists (run `npm run build`)
|
| 293 |
-
|
| 294 |
-
**"RAG system not initialized" (on Spaces)**
|
| 295 |
-
- Ensure `processed_documents.json` is present in the repo **and** not excluded by `.dockerignore`
|
| 296 |
-
- Upload your source PDFs (or processed data) in the Space UI, then restart the Space
|
| 297 |
-
- Check startup logs for initialization messages
|
| 298 |
-
|
| 299 |
-
**API Errors**
|
| 300 |
-
- Check that `OPENAI_API_KEY` is set correctly in Space secrets
|
| 301 |
-
- Verify the API key is valid and has credits
|
| 302 |
-
- Check the Space logs for detailed error messages
|
| 303 |
-
|
| 304 |
-
**Frontend Not Loading**
|
| 305 |
-
- Ensure `npm run build` was run successfully before pushing
|
| 306 |
-
- Check that `frontend/build/` directory exists and contains `index.html`
|
| 307 |
-
- Verify the build completed without errors
|
| 308 |
-
|
| 309 |
-
**Document Preview Not Working**
|
| 310 |
-
- Ensure PDFs are uploaded to the `documents/` folder in the Space
|
| 311 |
-
- Check that filenames match exactly (including encoding)
|
| 312 |
-
- Verify documents are accessible via the Space's file browser
|
| 313 |
-
|
| 314 |
-
**Push Rejected - Binary Files**
|
| 315 |
-
- Enable Xet storage for your Space (see Step 4 above)
|
| 316 |
-
- Or exclude PDFs from git and upload via Space UI
|
| 317 |
-
|
| 318 |
-
---
|
| 319 |
-
|
| 320 |
-
## Next Steps
|
| 321 |
-
|
| 322 |
-
- See [README.md](README.md) for full documentation and API details
|
| 323 |
-
- Check the Space logs for detailed startup and error information
|
| 324 |
-
- Monitor your OpenAI API usage to avoid unexpected charges
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
DELETED
|
@@ -1,133 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Saudi Law AI Assistant
|
| 3 |
-
emoji: ⚖️
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo: purple
|
| 6 |
-
sdk: docker
|
| 7 |
-
pinned: false
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
# Law Document RAG Chat Application
|
| 11 |
-
...
|
| 12 |
-
|
| 13 |
-
# Law Document RAG Chat Application
|
| 14 |
-
|
| 15 |
-
A web application that allows users to ask questions about indexed legal documents using Retrieval Augmented Generation (RAG) techniques.
|
| 16 |
-
|
| 17 |
-
## Features
|
| 18 |
-
|
| 19 |
-
- 🤖 **RAG-powered Q&A**: Ask questions about your legal documents and get answers extracted directly from the context
|
| 20 |
-
- 📚 **Document Indexing**: Automatically index PDF, TXT, DOCX, and DOC files from a folder
|
| 21 |
-
- 🎨 **Modern React Frontend**: Beautiful, responsive chat interface
|
| 22 |
-
- ⚡ **FastAPI Backend**: High-performance API with LangChain and FAISS
|
| 23 |
-
- 🔍 **Exact Context Extraction**: Answers are extracted directly from documents, not generated
|
| 24 |
-
- 🚀 **Hugging Face Spaces Ready**: Configured for easy deployment
|
| 25 |
-
|
| 26 |
-
## Tech Stack
|
| 27 |
-
|
| 28 |
-
- **Frontend**: React 18
|
| 29 |
-
- **Backend**: FastAPI
|
| 30 |
-
- **RAG**: LangChain + FAISS + OpenAI Embeddings
|
| 31 |
-
- **Vector Database**: FAISS
|
| 32 |
-
- **LLM**: OpenAI API (for embeddings)
|
| 33 |
-
- **Python**: 3.10 or 3.11 (required for faiss-cpu compatibility)
|
| 34 |
-
|
| 35 |
-
## Project Structure
|
| 36 |
-
|
| 37 |
-
```
|
| 38 |
-
KSAlaw-document-agent/
|
| 39 |
-
├── backend/
|
| 40 |
-
│ ├── main.py # FastAPI application
|
| 41 |
-
│ ├── rag_system.py # RAG implementation
|
| 42 |
-
│ ├── document_processor.py # Document processing logic
|
| 43 |
-
│ ├── embeddings.py # OpenAI embeddings wrapper
|
| 44 |
-
│ └── chat_history.py # Chat history management
|
| 45 |
-
├── frontend/
|
| 46 |
-
│ ├── src/
|
| 47 |
-
│ │ ├── App.js # Main React component
|
| 48 |
-
│ │ ├── App.css # Styles
|
| 49 |
-
│ │ ├── index.js # React entry point
|
| 50 |
-
│ │ └── index.css # Global styles
|
| 51 |
-
│ ├── build/ # Built React app (for deployment)
|
| 52 |
-
│ ├── public/
|
| 53 |
-
│ │ └── index.html # HTML template
|
| 54 |
-
│ └── package.json # Node dependencies
|
| 55 |
-
├── documents/ # Place your PDF documents here
|
| 56 |
-
├── vectorstore/ # FAISS vectorstore (auto-generated)
|
| 57 |
-
├── app.py # Hugging Face Spaces entry point
|
| 58 |
-
├── Dockerfile # Docker configuration
|
| 59 |
-
├── pyproject.toml # Python dependencies (uv)
|
| 60 |
-
├── uv.lock # Locked dependencies
|
| 61 |
-
├── processed_documents.json # Processed document summaries
|
| 62 |
-
├── QUICKSTART.md # Complete setup and deployment guide
|
| 63 |
-
└── README.md # This file
|
| 64 |
-
```
|
| 65 |
-
|
| 66 |
-
## Quick Start
|
| 67 |
-
|
| 68 |
-
For complete setup and deployment instructions, see **[QUICKSTART.md](QUICKSTART.md)**.
|
| 69 |
-
|
| 70 |
-
### Quick Overview
|
| 71 |
-
|
| 72 |
-
**Local Development:**
|
| 73 |
-
1. Install dependencies: `uv sync` and `cd frontend && npm install`
|
| 74 |
-
2. Create `.env` with your `OPENAI_API_KEY`
|
| 75 |
-
3. Add documents to `documents/` folder
|
| 76 |
-
4. Run backend: `uv run python backend/main.py`
|
| 77 |
-
5. Run frontend: `cd frontend && npm start`
|
| 78 |
-
|
| 79 |
-
**Deployment to Hugging Face Spaces:**
|
| 80 |
-
1. Build frontend: `cd frontend && npm run build`
|
| 81 |
-
2. Set up Xet storage (recommended) or prepare to upload PDFs via UI
|
| 82 |
-
3. Push to Hugging Face: `git push hf main`
|
| 83 |
-
4. Set `OPENAI_API_KEY` in Space secrets
|
| 84 |
-
|
| 85 |
-
See [QUICKSTART.md](QUICKSTART.md) for detailed step-by-step instructions for both local development and deployment.
|
| 86 |
-
|
| 87 |
-
## API Endpoints
|
| 88 |
-
|
| 89 |
-
- `GET /api/` - Health check
|
| 90 |
-
- `GET /api/health` - Health status
|
| 91 |
-
- `POST /api/index` - Index documents from a folder
|
| 92 |
-
```json
|
| 93 |
-
{
|
| 94 |
-
"folder_path": "documents"
|
| 95 |
-
}
|
| 96 |
-
```
|
| 97 |
-
- `POST /api/ask` - Ask a question
|
| 98 |
-
```json
|
| 99 |
-
{
|
| 100 |
-
"question": "What is the law about X?"
|
| 101 |
-
}
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
## Environment Variables
|
| 105 |
-
|
| 106 |
-
- `OPENAI_API_KEY`: Your OpenAI API key (required)
|
| 107 |
-
|
| 108 |
-
## Notes
|
| 109 |
-
|
| 110 |
-
- The system extracts exact text from documents, not generated responses
|
| 111 |
-
- Supported document formats: PDF, TXT, DOCX, DOC
|
| 112 |
-
- The vectorstore is saved locally and persists between sessions
|
| 113 |
-
- Documents are automatically processed on startup (no manual indexing needed)
|
| 114 |
-
- For Hugging Face Spaces, the frontend automatically uses `/api` as the API URL
|
| 115 |
-
- This project uses `uv` for Python package management - dependencies are defined in `pyproject.toml` and `uv.lock`
|
| 116 |
-
- The `.env` file should be in the project root (not in the backend folder)
|
| 117 |
-
- PDFs can be stored using Hugging Face Xet storage or uploaded via the Space UI
|
| 118 |
-
|
| 119 |
-
## Troubleshooting
|
| 120 |
-
|
| 121 |
-
For detailed troubleshooting, see the [Troubleshooting section in QUICKSTART.md](QUICKSTART.md#troubleshooting).
|
| 122 |
-
|
| 123 |
-
### Common Issues
|
| 124 |
-
|
| 125 |
-
- **OpenAI API Key Error**: Make sure `OPENAI_API_KEY` is set in your `.env` file (local) or Space secrets (deployment)
|
| 126 |
-
- **No documents found**: Ensure documents are in the `documents/` folder with supported extensions (PDF, TXT, DOCX, DOC)
|
| 127 |
-
- **Frontend can't connect**: Check that the backend is running on port 8000
|
| 128 |
-
- **Build fails on Spaces**: Ensure `frontend/build/` exists (run `npm run build`), check Dockerfile, verify dependencies in `pyproject.toml`
|
| 129 |
-
- **RAG system not initialized**: Check Space logs, ensure `processed_documents.json` exists and is not ignored by `.dockerignore`
|
| 130 |
-
|
| 131 |
-
## License
|
| 132 |
-
|
| 133 |
-
MIT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|