Spaces:
Sleeping
Sleeping
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| This is a **production-ready GAIA benchmark AI agent** achieving 85% accuracy through a sophisticated multi-agent architecture. The system has been **fully refactored** into a modular, maintainable architecture that specializes in complex question answering across multimedia, research, file processing, chess analysis, and mathematical reasoning domains. | |
| ## Development Commands | |
| ### Setup and Installation | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Test API key configuration | |
| python test_api_keys.py | |
| # Verify core functionality | |
| python -c "from main import GAIASolver; print('β Core GAIASolver available')" | |
| ``` | |
| ### Running the System | |
| ```bash | |
| # Run legacy monolithic solver | |
| python main.py | |
| # Run refactored modular solver (recommended) | |
| python main_refactored.py | |
| # Run Gradio web interface | |
| python app.py | |
| ``` | |
| ### Testing Commands | |
| ```bash | |
| # Comprehensive async testing | |
| python async_complete_test.py | |
| # Test question classification | |
| python test_improved_classification.py | |
| python final_classification_test.py | |
| # Test YouTube functionality | |
| python direct_youtube_test.py | |
| python simple_youtube_test.py | |
| python test_youtube_question.py | |
| # Test individual components | |
| python -c "from gaia_tools import GAIA_TOOLS; print(f'Available tools: {len(GAIA_TOOLS)}')" | |
| python -c "from question_classifier import QuestionClassifier; c = QuestionClassifier(); print('β Classifier ready')" | |
| ``` | |
| ## Architecture Overview | |
| ### Dual Architecture Design | |
| This project maintains both **legacy monolithic** and **refactored modular** architectures: | |
| **Legacy Architecture (main.py):** | |
| - Monolithic 1285-line solver with all functionality integrated | |
| - Comprehensive tool collection in gaia_tools.py (4887 lines) | |
| - Single-file approach for rapid development and deployment | |
| **Refactored Architecture (gaia/ package):** | |
| ``` | |
| gaia/ | |
| βββ core/ # Main solver logic | |
| β βββ solver.py # GAIASolver main class | |
| β βββ answer_extractor.py # Specialized answer extraction classes | |
| β βββ question_processor.py # Question classification and processing | |
| βββ tools/ # Tool implementations | |
| β βββ base.py # Abstract tool interface and registry | |
| β βββ registry.py # Tool discovery and management | |
| β βββ [specialized tool modules] | |
| βββ models/ # Model providers and management | |
| β βββ manager.py # ModelManager with fallback chains | |
| β βββ providers.py # LiteLLM, Gemini, Kluster providers | |
| βββ config/ # Configuration management | |
| β βββ settings.py # Config, ModelConfig classes | |
| βββ utils/ # Utilities and helpers | |
| βββ exceptions.py # Custom exception hierarchy | |
| βββ logging.py # Logging configuration | |
| ``` | |
| ### Core Components | |
| **GAIASolver (main.py):** Legacy monolithic solver with 1000+ lines of sophisticated processing logic | |
| **GAIASolver (gaia/core/solver.py):** Refactored main orchestrator using dependency injection | |
| **QuestionClassifier:** LLM-based intelligent routing with pattern-based fallbacks | |
| **GAIA_TOOLS:** 42 specialized tools including enhanced Wikipedia research, chess analysis, Excel processing, and multimedia analysis | |
| **ModelManager:** Handles model initialization, fallback chains (Kluster.ai β Gemini β Qwen), and lifecycle management | |
| ### Question Type Specialization | |
| **Research Questions (92% accuracy):** | |
| - Enhanced Wikipedia tools with date-specific searches and Featured Articles integration | |
| - Multi-step research coordination with cross-validation | |
| - Anti-hallucination safeguards to prevent fabrication | |
| **Chess Questions (100% accuracy):** | |
| - Universal FEN correction system handling any vision error pattern | |
| - Multi-tool consensus system for maximum accuracy | |
| - Perfect algebraic notation extraction | |
| **YouTube/Multimedia Questions:** | |
| - Enhanced URL detection with multiple regex patterns | |
| - Forced classification override for YouTube content | |
| - Specialized prompts with explicit tool usage instructions | |
| **File Processing (100% accuracy):** | |
| - Format-specific tools for Excel (.xlsx/.xls), Python (.py), text files | |
| - Deterministic Python execution with sandboxed environment | |
| - Financial calculation specialization with proper currency formatting | |
| ## Environment Configuration | |
| ### Required API Keys (set in .env) | |
| - `GEMINI_API_KEY` - Primary model (Gemini Flash 2.0) | |
| - `HUGGINGFACE_TOKEN` - Fallback model and classification | |
| - `KLUSTER_API_KEY` - Optional premium model access | |
| ### Model Fallback Chain | |
| 1. **Kluster.ai** (Qwen3-235B, Gemma3-27B) - Premium option | |
| 2. **Gemini Flash 2.0** - Primary production model | |
| 3. **Qwen 2.5-72B** - Reliable fallback via HuggingFace | |
| ## Key Design Patterns | |
| ### Anti-Hallucination Architecture | |
| - **Tool result prioritization**: Always uses exact tool outputs over internal reasoning | |
| - **Cross-validation**: Multiple verification methods for critical information | |
| - **Source attribution**: Clear tracking and validation of information sources | |
| - **Validation rules**: Type-specific answer extraction and verification | |
| ### Performance Optimizations | |
| - **Fresh agent creation** for each question to avoid token accumulation | |
| - **Concurrent processing** support with async operations | |
| - **15-minute web cache** for improved response times | |
| - **Exponential backoff** for API rate limiting | |
| ## File Organization | |
| ### Core Files | |
| - `main.py` - Legacy monolithic solver (1285 lines) | |
| - `main_refactored.py` - Entry point for refactored architecture | |
| - `gaia_tools.py` - 42 specialized tools with robust error handling (4887 lines) | |
| - `question_classifier.py` - LLM + pattern-based classification system | |
| - `app.py` - Production Gradio interface with comprehensive error handling | |
| ### Supporting Files | |
| - `async_complete_test.py` - Comprehensive async testing infrastructure | |
| - `enhanced_wikipedia_tools.py` - Advanced Wikipedia research capabilities | |
| - `universal_fen_correction.py` - Chess-specific FEN notation correction | |
| - `wikipedia_featured_articles_by_date.py` - Date-specific Wikipedia searches | |
| ## Local Configuration Notes | |
| - huggingface token can get from secrets in .env |