Spaces:

ianshank
/

langgraph-mcts-demo

Sleeping

ianshank Claude commited on 20 days ago

Commit

40ee6b4

0 Parent(s):

feat: add personality output and bug fixes

- Added PersonalityResponseGenerator for conversational advisor responses
- Updated app.py with personality-infused output section
- Added sentence-transformers dependency
- Includes all trained model files and bug fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +27 -0
DEPLOYMENT_GUIDE.md +306 -0
README.md +225 -0
app.py +553 -0
app_mock.py +590 -0
demo_src/__init__.py +1 -0
demo_src/agents_demo.py +234 -0
demo_src/llm_mock.py +182 -0
demo_src/mcts_demo.py +436 -0
demo_src/wandb_tracker.py +349 -0
models/bert_lora/final_model/README.md +206 -0
models/bert_lora/final_model/adapter_config.json +40 -0
models/bert_lora/final_model/adapter_model.safetensors +0 -0
models/bert_lora/generated_dataset.json +0 -0
models/bert_lora/training_results.json +48 -0
models/rnn_meta_controller.history.json +128 -0
models/rnn_meta_controller.pt +0 -0
requirements.txt +28 -0
src/__init__.py +0 -0
src/adapters/__init__.py +7 -0
src/adapters/llm/__init__.py +257 -0
src/adapters/llm/anthropic_client.py +521 -0
src/adapters/llm/base.py +305 -0
src/adapters/llm/exceptions.py +204 -0
src/adapters/llm/lmstudio_client.py +346 -0
src/adapters/llm/openai_client.py +458 -0
src/agents/__init__.py +0 -0
src/agents/hrm_agent.py +454 -0
src/agents/meta_controller/__init__.py +45 -0
src/agents/meta_controller/base.py +219 -0
src/agents/meta_controller/bert_controller.py +428 -0
src/agents/meta_controller/config_loader.py +304 -0
src/agents/meta_controller/rnn_controller.py +345 -0
src/agents/meta_controller/utils.py +201 -0
src/agents/trm_agent.py +395 -0
src/api/__init__.py +35 -0
src/api/auth.py +439 -0
src/api/exceptions.py +299 -0
src/api/inference_server.py +380 -0
src/api/rest_server.py +441 -0
src/config/__init__.py +0 -0
src/config/meta_controller.yaml +22 -0
src/config/settings.py +431 -0
src/data/__init__.py +29 -0
src/data/dataset_loader.py +551 -0
src/data/preprocessing.py +406 -0
src/data/tactical_augmentation.py +484 -0
src/data/train_test_split.py +505 -0
src/framework/__init__.py +1 -0
src/framework/agents/__init__.py +22 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,27 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+# Virtual environment
+venv/
+env/
+.env
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Gradio
+flagged/
+gradio_cached_examples/
+# Logs
+*.log

DEPLOYMENT_GUIDE.md ADDED Viewed

	@@ -0,0 +1,306 @@

+# Hugging Face Spaces Deployment Guide
+This guide walks you through deploying the LangGraph Multi-Agent MCTS demo to Hugging Face Spaces.
+## Prerequisites
+- [Hugging Face Account](https://huggingface.co/join)
+- Git installed locally
+- Python 3.10+ (for local testing)
+## Step 1: Create a New Space
+1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+2. Click **"Create new Space"**
+3. Fill in the form:
+   - **Owner**: Your username or organization
+   - **Space name**: `langgraph-mcts-demo` (or your choice)
+   - **License**: MIT
+   - **SDK**: Gradio
+   - **Hardware**: CPU Basic (Free tier - sufficient for demo)
+   - **Visibility**: Public (or Private)
+4. Click **"Create Space"**
+## Step 2: Clone and Deploy
+### Option A: Git-based Deployment (Recommended)
+```bash
+# 1. Clone your new empty Space
+git clone https://huggingface.co/spaces/YOUR_USERNAME/langgraph-mcts-demo
+cd langgraph-mcts-demo
+# 2. Copy demo files from this directory
+cp -r /path/to/huggingface_space/* .
+cp -r /path/to/huggingface_space/.gitignore .
+# 3. Verify structure
+ls -la
+# Should show:
+# - app.py
+# - requirements.txt
+# - README.md
+# - .gitignore
+# - demo_src/
+#   - __init__.py
+#   - agents_demo.py
+#   - llm_mock.py
+#   - mcts_demo.py
+# 4. Commit and push
+git add -A
+git commit -m "Initial deployment of LangGraph Multi-Agent MCTS demo"
+git push
+# 5. Space will automatically build and deploy (takes 2-5 minutes)
+```
+### Option B: Direct Upload via Web UI
+1. Navigate to your Space on Hugging Face
+2. Click **"Files"** tab
+3. Click **"Add file"** → **"Upload files"**
+4. Upload all files maintaining the directory structure:
+   - `app.py`
+   - `requirements.txt`
+   - `README.md`
+   - `.gitignore`
+   - `demo_src/__init__.py`
+   - `demo_src/agents_demo.py`
+   - `demo_src/llm_mock.py`
+   - `demo_src/mcts_demo.py`
+5. Commit changes
+## Step 3: Monitor Deployment
+1. Go to your Space URL: `https://huggingface.co/spaces/YOUR_USERNAME/langgraph-mcts-demo`
+2. Click **"Logs"** tab to monitor build progress
+3. Wait for "Running on" message
+4. Your demo is now live!
+## Step 4: Test the Demo
+1. Enter a query or select an example
+2. Enable/disable different agents
+3. Adjust MCTS parameters
+4. Click "Process Query"
+5. Review results and consensus scores
+## Optional: Enable Real LLM Responses
+To use Hugging Face Inference API instead of mock responses:
+### 1. Update requirements.txt
+```txt
+gradio>=4.0.0,<5.0.0
+numpy>=1.24.0,<2.0.0
+huggingface_hub>=0.20.0
+```
+### 2. Add Secret Token
+1. Go to Space Settings → **Repository secrets**
+2. Add new secret:
+   - Name: `HF_TOKEN`
+   - Value: Your Hugging Face token (from [Settings → Access Tokens](https://huggingface.co/settings/tokens))
+### 3. Update app.py Initialization
+Change line ~290 in `app.py`:
+```python
+# From:
+framework = MultiAgentFrameworkDemo(use_hf_inference=False)
+# To:
+import os
+framework = MultiAgentFrameworkDemo(
+    use_hf_inference=True,
+    hf_model="mistralai/Mistral-7B-Instruct-v0.2"
+)
+```
+### 4. Commit and Push
+```bash
+git add -A
+git commit -m "Enable Hugging Face Inference API"
+git push
+```
+## Optional: Enable Weights & Biases Tracking
+Track experiments and visualize metrics with W&B integration.
+### 1. Get W&B API Key
+1. Sign up at [wandb.ai](https://wandb.ai)
+2. Go to Settings → API Keys
+3. Copy your API key
+### 2. Add W&B Secret to Space
+1. Go to Space Settings → **Repository secrets**
+2. Add new secret:
+   - Name: `WANDB_API_KEY`
+   - Value: Your W&B API key
+### 3. Use W&B in the Demo
+1. Expand "Weights & Biases Tracking" accordion in the UI
+2. Check "Enable W&B Tracking"
+3. Optionally set:
+   - **Project Name**: Your W&B project (default: `langgraph-mcts-demo`)
+   - **Run Name**: Custom name for this run (auto-generated if empty)
+4. Process your query
+5. View the W&B run URL in the results
+### 4. What Gets Logged
+- **Agent Metrics**: Confidence scores, execution times, response lengths
+- **MCTS Metrics**: Best value, visits, tree depth, exploration paths
+- **Consensus Metrics**: Agreement scores, agent combinations
+- **Performance**: Total processing time
+- **Artifacts**: Full JSON results as artifacts
+### 5. View Your Dashboard
+After runs, visit your W&B project dashboard to:
+- Compare different agent configurations
+- Visualize consensus patterns
+- Analyze MCTS exploration strategies
+- Track performance over time
+## Customization Options
+### Change Gradio Theme
+In `app.py`, modify:
+```python
+with gr.Blocks(
+    theme=gr.themes.Soft(),  # Try: Default(), Monochrome(), Glass()
+    ...
+) as demo:
+```
+### Add Custom Examples
+Update `EXAMPLE_QUERIES` list in `app.py`:
+```python
+EXAMPLE_QUERIES = [
+    "Your custom query 1",
+    "Your custom query 2",
+    ...
+]
+```
+### Adjust MCTS Parameters
+Modify sliders in `app.py`:
+```python
+mcts_iterations = gr.Slider(
+    minimum=10,
+    maximum=200,  # Increase for more thorough search
+    value=50,     # Change default
+    ...
+)
+```
+### Add More Agent Types
+1. Create new agent in `demo_src/agents_demo.py`
+2. Add to `MultiAgentFrameworkDemo` in `app.py`
+3. Add UI controls in Gradio interface
+## Troubleshooting
+### Build Fails
+- Check **Logs** tab for error details
+- Verify `requirements.txt` has compatible versions
+- Ensure all imports in `app.py` are satisfied
+### Slow Performance
+- Reduce default MCTS iterations
+- Use mock LLM (no API calls)
+- Simplify tree visualization
+### Memory Issues (Free Tier)
+- Limit max MCTS iterations to 100
+- Reduce tree depth in `demo_src/mcts_demo.py`
+- Simplify response generation
+### Missing Files
+Ensure directory structure:
+```
+your-space/
+├── app.py
+├── requirements.txt
+├── README.md
+├── .gitignore
+└── demo_src/
+    ├── __init__.py
+    ├── agents_demo.py
+    ├── llm_mock.py
+    ├── mcts_demo.py
+    └── wandb_tracker.py
+```
+## Upgrading Hardware
+For better performance:
+1. Go to Space Settings
+2. Under **Hardware**, select:
+   - **CPU Upgrade** ($0.03/hr) - Faster processing
+   - **T4 Small** ($0.60/hr) - GPU for neural models
+3. Save changes
+## Sharing Your Space
+### Embed in Website
+```html
+<iframe
+  src="https://YOUR_USERNAME-langgraph-mcts-demo.hf.space"
+  frameborder="0"
+  width="100%"
+  height="600"
+></iframe>
+```
+### Direct Link
+Share: `https://huggingface.co/spaces/YOUR_USERNAME/langgraph-mcts-demo`
+### API Access
+Gradio automatically provides API endpoint:
+```
+https://YOUR_USERNAME-langgraph-mcts-demo.hf.space/api/predict
+```
+## Next Steps
+1. **Collect Feedback**: Enable flagging for user feedback
+2. **Add Analytics**: Track usage patterns
+3. **Extend Agents**: Add domain-specific reasoning modules
+4. **Integrate RAG**: Connect to vector databases for real context
+5. **Add Visualization**: Enhanced tree and consensus displays
+## Support
+- **Hugging Face Docs**: https://huggingface.co/docs/hub/spaces
+- **Gradio Docs**: https://www.gradio.app/docs
+- **Full Framework**: https://github.com/ianshank/langgraph_multi_agent_mcts
+---
+**Happy Deploying!** 🚀

README.md ADDED Viewed

	@@ -0,0 +1,225 @@

+---
+title: LangGraph Multi-Agent MCTS Demo
+emoji: 🌳
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+tags:
+  - multi-agent
+  - mcts
+  - reasoning
+  - langgraph
+  - ai-agents
+  - wandb
+  - experiment-tracking
+short_description: Multi-agent reasoning framework with Monte Carlo Tree Search
+---
+# LangGraph Multi-Agent MCTS Framework
+**Production Demo with Trained Neural Models** - Experience real trained meta-controllers for intelligent agent routing
+## What This Demo Shows
+This interactive demo showcases trained neural meta-controllers that dynamically route queries to specialized agents:
+### 🤖 Trained Meta-Controllers
+1. **RNN Meta-Controller**
+   - GRU-based recurrent neural network
+   - Learns sequential patterns in agent performance
+   - Fast inference (~2ms latency)
+   - Trained on 1000+ synthetic routing examples
+2. **BERT Meta-Controller with LoRA**
+   - Transformer-based text understanding
+   - Parameter-efficient fine-tuning with LoRA adapters
+   - Context-aware routing decisions
+   - Better generalization to unseen query patterns
+### 🧠 Three Specialized Agents
+1. **HRM (Hierarchical Reasoning Module)**
+   - Best for: Complex decomposition, multi-level problems
+   - Technique: Hierarchical planning with adaptive computation
+2. **TRM (Tree Reasoning Module)**
+   - Best for: Iterative refinement, comparison tasks
+   - Technique: Recursive refinement with convergence detection
+3. **MCTS (Monte Carlo Tree Search)**
+   - Best for: Optimization, strategic planning
+   - Technique: UCB1 exploration with value backpropagation
+### 📊 Key Features
+- **Real Trained Models**: Production-ready neural meta-controllers
+- **Intelligent Routing**: Models learn optimal agent selection patterns
+- **Routing Visualization**: See confidence scores and probability distributions
+- **Feature Engineering**: Demonstrates query → features → routing pipeline
+- **Performance Metrics**: Track execution time and routing accuracy
+## How to Use
+1. **Enter a Query**: Type your question or select an example
+2. **Select Controller**: Choose RNN (fast) or BERT (context-aware)
+3. **Process Query**: Click "🚀 Process Query"
+4. **Review Results**:
+   - See which agent the controller selected
+   - View routing confidence and probabilities
+   - Examine features used for decision-making
+   - Check agent execution details
+## Weights & Biases Integration
+Track your experiments with **Weights & Biases** for:
+- 📈 **Metrics Dashboard**: Visualize consensus scores, execution times, agent performance
+- 🔄 **Run Comparison**: Compare different configurations side-by-side
+- 📊 **Experiment History**: Track all your queries and results
+- 🌳 **MCTS Visualization**: Log tree exploration patterns
+### Setting Up W&B
+1. **Get API Key**: Sign up at [wandb.ai](https://wandb.ai) and get your API key
+2. **Configure Space Secret** (if deploying your own):
+   - Go to Space Settings → Repository secrets
+   - Add: `WANDB_API_KEY` = your API key
+3. **Enable in UI**:
+   - Expand "Weights & Biases Tracking" accordion
+   - Check "Enable W&B Tracking"
+   - Set project name (optional)
+   - Set run name (optional, auto-generated if empty)
+4. **View Results**: After processing, click the W&B run URL to see your dashboard
+### Logged Metrics
+- **Per Agent**: Confidence, execution time, response length, reasoning steps
+- **MCTS**: Best value, visits, tree depth, top actions with UCB1 scores
+- **Consensus**: Score, level (high/medium/low), number of agents
+- **Performance**: Total processing time
+- **Artifacts**: Full JSON results, tree visualizations
+## Example Queries
+- "What are the key factors to consider when choosing between microservices and monolithic architecture?"
+- "How can we optimize a Python application that processes 10GB of log files daily?"
+- "Should we use SQL or NoSQL database for a social media application with 1M users?"
+- "How to design a fault-tolerant message queue system?"
+## Technical Details
+### Architecture
+```
+Query Input
+    │
+    ├─→ HRM Agent (Hierarchical Decomposition)
+    │      ├─ Component Analysis
+    │      └─ Structured Synthesis
+    │
+    ├─→ TRM Agent (Iterative Refinement)
+    │      ├─ Initial Response
+    │      ├─ Clarity Enhancement
+    │      └─ Validation Check
+    │
+    └─→ MCTS Engine (Strategic Search)
+           ├─ Selection (UCB1)
+           ├─ Expansion
+           ├─ Simulation
+           └─ Backpropagation
+                    │
+                    ▼
+           Consensus Scoring
+                    │
+                    ▼
+           Final Synthesized Response
+```
+### MCTS Algorithm
+The Monte Carlo Tree Search implementation uses:
+- **UCB1 Selection**: `Q(s,a) + C * sqrt(ln(N(s)) / N(s,a))`
+- **Progressive Widening**: Controls branching factor
+- **Domain-Aware Actions**: Contextual decision options
+- **Value Backpropagation**: Updates entire path statistics
+### Consensus Calculation
+```
+consensus = average_confidence * agreement_factor
+agreement_factor = max(0, 1 - std_deviation * 2)
+```
+High consensus (>70%) indicates agents agree on approach.
+Low consensus (<40%) suggests uncertainty or conflicting strategies.
+## Demo Scope
+This demonstration focuses on **meta-controller training and routing**:
+- ✅ **Real Trained Models**: Production RNN and BERT controllers
+- ✅ **Actual Model Loading**: PyTorch and HuggingFace Transformers
+- ✅ **Feature Engineering**: Query analysis → feature vectors
+- ✅ **Routing Visualization**: See controller decision-making
+- ⚠️ **Simplified Agents**: Agent responses are mocked for demo purposes
+- ⚠️ **No Live LLM Calls**: Agents don't call actual LLMs (to reduce latency/cost)
+## Full Production Framework
+The complete repository includes all production features:
+- ✅ **Neural Meta-Controllers**: RNN and BERT with LoRA (deployed here!)
+- ✅ **Agent Implementations**: Full HRM, TRM, and MCTS with PyTorch
+- ✅ **Training Pipeline**: Data generation, training, evaluation
+- ✅ **LLM Integration**: OpenAI, Anthropic, LM Studio support
+- ✅ **RAG Systems**: ChromaDB, FAISS, Pinecone vector stores
+- ✅ **Observability**: OpenTelemetry tracing, Prometheus metrics
+- ✅ **Storage**: S3 artifact storage, experiment tracking
+- ✅ **CI/CD**: Automated testing, security scanning, deployment
+**GitHub Repository**: [ianshank/langgraph_multi_agent_mcts](https://github.com/ianshank/langgraph_multi_agent_mcts)
+## Technical Stack
+- **Python**: 3.11+
+- **UI**: Gradio 4.x
+- **ML Frameworks**: PyTorch 2.1+, Transformers, PEFT (LoRA)
+- **Models**: GRU-based RNN, BERT-mini with LoRA adapters
+- **Architecture**: Neural meta-controller + multi-agent system
+- **Experiment Tracking**: Weights & Biases (optional)
+- **Numerical**: NumPy
+## Research Applications
+This framework demonstrates concepts applicable to:
+- Complex decision-making systems
+- AI-assisted software architecture decisions
+- Multi-perspective problem analysis
+- Strategic planning with uncertainty
+## Citation
+If you use this framework in research, please cite:
+```bibtex
+@software{langgraph_mcts_2024,
+  title={LangGraph Multi-Agent MCTS Framework},
+  author={Your Name},
+  year={2024},
+  url={https://github.com/ianshank/langgraph_multi_agent_mcts}
+}
+```
+## License
+MIT License - See repository for details.
+---
+**Built with** LangGraph, Gradio, and Python | **Demo Version**: 1.0.0

app.py ADDED Viewed

	@@ -0,0 +1,553 @@

+"""
+LangGraph Multi-Agent MCTS Framework - Integrated Demo with Trained Models
+Demonstrates the actual trained neural meta-controllers:
+- RNN Meta-Controller for sequential pattern recognition
+- BERT with LoRA adapters for text-based routing
+This is a production demonstration using real trained models.
+"""
+import asyncio
+import sys
+import time
+from dataclasses import dataclass
+from pathlib import Path
+# Fail fast if critical dependencies are missing or broken
+try:
+    import peft
+    print(f"[OK] PEFT library imported successfully (version: {peft.__version__})")
+except ImportError as e:
+    print(f"CRITICAL ERROR: Could not import peft library: {e}")
+    # We don't exit here to allow the app to crash naturally later with full stack trace,
+    # but this print ensures it's visible in the logs immediately.
+import gradio as gr
+import torch
+# Import the trained controllers
+sys.path.insert(0, str(Path(__file__).parent))
+from src.agents.meta_controller.base import MetaControllerFeatures
+from src.agents.meta_controller.bert_controller import BERTMetaController
+from src.agents.meta_controller.rnn_controller import RNNMetaController
+from src.agents.meta_controller.feature_extractor import (
+    FeatureExtractor,
+    FeatureExtractorConfig,
+)
+from src.utils.personality_response import PersonalityResponseGenerator
+@dataclass
+class AgentResult:
+    """Result from a single agent."""
+    agent_name: str
+    response: str
+    confidence: float
+    reasoning_steps: list[str]
+    execution_time_ms: float
+@dataclass
+class ControllerDecision:
+    """Decision made by the meta-controller."""
+    selected_agent: str
+    confidence: float
+    routing_probabilities: dict[str, float]
+    features_used: dict
+def create_features_from_query(
+    query: str,
+    iteration: int = 0,
+    last_agent: str = "none",
+    feature_extractor: FeatureExtractor | None = None,
+) -> MetaControllerFeatures:
+    """
+    Convert a text query into features for the meta-controller.
+    Uses semantic embeddings for robust feature extraction. Falls back to
+    heuristic-based extraction if embeddings are not available.
+    Args:
+        query: The input query text
+        iteration: Current iteration number
+        last_agent: Name of the last agent used
+        feature_extractor: Optional FeatureExtractor instance (created if None)
+    Returns:
+        MetaControllerFeatures instance
+    """
+    # Use provided feature extractor or create a new one
+    if feature_extractor is None:
+        try:
+            config = FeatureExtractorConfig.from_env()
+            feature_extractor = FeatureExtractor(config)
+        except Exception as e:
+            print(f"Warning: Failed to initialize FeatureExtractor: {e}")
+            print("Falling back to heuristic-based feature extraction")
+            # Will use heuristic fallback below
+    # Extract features using the feature extractor
+    try:
+        if feature_extractor is not None:
+            return feature_extractor.extract_features(query, iteration, last_agent)
+    except Exception as e:
+        print(f"Warning: Feature extraction failed: {e}")
+        print("Falling back to heuristic-based feature extraction")
+    # Fallback to original heuristic-based extraction
+    # (This code is kept as a safety net but should rarely be used)
+    query_length = len(query)
+    # Estimate complexity based on query characteristics
+    has_multiple_questions = "?" in query and query.count("?") > 1
+    has_comparison = any(word in query.lower() for word in ["vs", "versus", "compare", "difference", "better"])
+    has_optimization = any(word in query.lower() for word in ["optimize", "best", "improve", "maximize", "minimize"])
+    has_technical = any(word in query.lower() for word in ["algorithm", "code", "implement", "technical", "system"])
+    # Create mock confidence scores based on query characteristics
+    hrm_confidence = 0.5 + (0.3 if has_multiple_questions else 0) + (0.1 if has_technical else 0)
+    trm_confidence = 0.5 + (0.3 if has_comparison else 0) + (0.1 if query_length > 100 else 0)
+    mcts_confidence = 0.5 + (0.3 if has_optimization else 0) + (0.1 if has_technical else 0)
+    # Normalize
+    total = hrm_confidence + trm_confidence + mcts_confidence
+    if total == 0:
+        hrm_confidence = 1.0 / 3.0
+        trm_confidence = 1.0 / 3.0
+        mcts_confidence = 1.0 / 3.0
+    else:
+        hrm_confidence /= total
+        trm_confidence /= total
+        mcts_confidence /= total
+    # Calculate consensus score
+    max_confidence = max(hrm_confidence, trm_confidence, mcts_confidence)
+    if max_confidence == 0:
+        consensus_score = 0.0
+    else:
+        consensus_score = min(hrm_confidence, trm_confidence, mcts_confidence) / max_confidence
+    features = MetaControllerFeatures(
+        hrm_confidence=hrm_confidence,
+        trm_confidence=trm_confidence,
+        mcts_value=mcts_confidence,
+        consensus_score=consensus_score,
+        last_agent=last_agent,
+        iteration=iteration,
+        query_length=query_length,
+        has_rag_context=query_length > 50,
+        rag_relevance_score=0.7 if query_length > 50 else 0.0,
+        is_technical_query=has_technical,
+    )
+    return features
+class IntegratedFramework:
+    """
+    Integrated multi-agent framework using trained meta-controllers.
+    """
+    def __init__(self):
+        """Initialize the framework with trained models."""
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        print(f"Using device: {self.device}")
+        # Initialize feature extractor with semantic embeddings
+        print("Initializing Feature Extractor...")
+        try:
+            config = FeatureExtractorConfig.from_env()
+            # Set device to match the framework device
+            config.device = self.device
+            self.feature_extractor = FeatureExtractor(config)
+            print(f"[OK] Feature Extractor initialized: {self.feature_extractor}")
+        except Exception as e:
+            print(f"[WARN] Failed to initialize Feature Extractor: {e}")
+            print("[WARN] Will fall back to heuristic-based feature extraction")
+            self.feature_extractor = None
+        # Load trained RNN Meta-Controller
+        print("Loading RNN Meta-Controller...")
+        self.rnn_controller = RNNMetaController(name="RNNController", seed=42, device=self.device)
+        # Load the trained weights
+        rnn_model_path = Path(__file__).parent / "models" / "rnn_meta_controller.pt"
+        if rnn_model_path.exists():
+            checkpoint = torch.load(rnn_model_path, map_location=self.device, weights_only=True)
+            self.rnn_controller.model.load_state_dict(checkpoint)
+            self.rnn_controller.model.eval()
+            print(f"[OK] Loaded RNN model from {rnn_model_path}")
+        else:
+            print(f"[WARN] RNN model not found at {rnn_model_path}, using untrained model")
+        # Load trained BERT Meta-Controller with LoRA
+        print("Loading BERT Meta-Controller with LoRA...")
+        self.bert_controller = BERTMetaController(name="BERTController", seed=42, device=self.device, use_lora=True)
+        bert_model_path = Path(__file__).parent / "models" / "bert_lora" / "final_model"
+        if bert_model_path.exists():
+            try:
+                self.bert_controller.load_model(str(bert_model_path))
+                print(f"[OK] Loaded BERT LoRA model from {bert_model_path}")
+            except Exception as e:
+                print(f"[WARN] Error loading BERT model: {e}")
+                print("Using untrained BERT model")
+        else:
+            print(f"[WARN] BERT model not found at {bert_model_path}, using untrained model")
+        # Agent routing map
+        self.agent_handlers = {
+            "hrm": self._handle_hrm,
+            "trm": self._handle_trm,
+            "mcts": self._handle_mcts,
+        }
+        print("Framework initialized successfully!")
+    async def process_query(
+        self,
+        query: str,
+        controller_type: str = "rnn",
+    ) -> tuple[AgentResult, ControllerDecision]:
+        """
+        Process a query using the trained meta-controller.
+        Args:
+            query: The input query
+            controller_type: Which controller to use ("rnn" or "bert")
+        Returns:
+            (agent_result, controller_decision) tuple
+        """
+        start_time = time.perf_counter()
+        # Step 1: Convert query to features using semantic embeddings
+        features = create_features_from_query(query, feature_extractor=self.feature_extractor)
+        # Step 2: Get controller decision
+        if controller_type == "rnn":
+            prediction = self.rnn_controller.predict(features)
+        else:  # bert
+            prediction = self.bert_controller.predict(features)
+        selected_agent = prediction.agent
+        confidence = prediction.confidence
+        # Get routing probabilities (prediction.probabilities is already a dict)
+        routing_probs = prediction.probabilities
+        # Step 3: Route to selected agent
+        handler = self.agent_handlers.get(selected_agent, self._handle_hrm)
+        agent_result = await handler(query)
+        # Create controller decision summary
+        controller_decision = ControllerDecision(
+            selected_agent=selected_agent,
+            confidence=confidence,
+            routing_probabilities=routing_probs,
+            features_used={
+                "hrm_confidence": features.hrm_confidence,
+                "trm_confidence": features.trm_confidence,
+                "mcts_value": features.mcts_value,
+                "consensus_score": features.consensus_score,
+                "query_length": features.query_length,
+                "is_technical": features.is_technical_query,
+            },
+        )
+        total_time = (time.perf_counter() - start_time) * 1000
+        agent_result.execution_time_ms = round(total_time, 2)
+        return agent_result, controller_decision
+    async def _handle_hrm(self, query: str) -> AgentResult:
+        """Handle query with Hierarchical Reasoning Module."""
+        # Simulate HRM processing
+        await asyncio.sleep(0.1)
+        steps = [
+            "Decompose query into hierarchical subproblems",
+            "Apply high-level reasoning (H-Module)",
+            "Execute low-level refinement (L-Module)",
+            "Synthesize hierarchical solution",
+        ]
+        response = f"[HRM Analysis] Breaking down the problem hierarchically: {query[:100]}..."
+        return AgentResult(
+            agent_name="HRM (Hierarchical Reasoning)",
+            response=response,
+            confidence=0.85,
+            reasoning_steps=steps,
+            execution_time_ms=0.0,
+        )
+    async def _handle_trm(self, query: str) -> AgentResult:
+        """Handle query with Tree Reasoning Module."""
+        # Simulate TRM processing
+        await asyncio.sleep(0.1)
+        steps = [
+            "Initialize solution state",
+            "Recursive refinement iteration 1",
+            "Recursive refinement iteration 2",
+            "Convergence achieved - finalize",
+        ]
+        response = f"[TRM Analysis] Applying iterative refinement: {query[:100]}..."
+        return AgentResult(
+            agent_name="TRM (Iterative Refinement)",
+            response=response,
+            confidence=0.80,
+            reasoning_steps=steps,
+            execution_time_ms=0.0,
+        )
+    async def _handle_mcts(self, query: str) -> AgentResult:
+        """Handle query with MCTS."""
+        # Simulate MCTS processing
+        await asyncio.sleep(0.15)
+        steps = [
+            "Build search tree",
+            "Selection: UCB1 exploration",
+            "Expansion: Add promising nodes",
+            "Simulation: Rollout evaluation",
+            "Backpropagation: Update values",
+        ]
+        response = f"[MCTS Analysis] Strategic exploration via tree search: {query[:100]}..."
+        return AgentResult(
+            agent_name="MCTS (Monte Carlo Tree Search)",
+            response=response,
+            confidence=0.88,
+            reasoning_steps=steps,
+            execution_time_ms=0.0,
+        )
+# Global framework instance
+framework = None
+def initialize_framework():
+    """Initialize or reinitialize the framework."""
+    global framework
+    try:
+        framework = IntegratedFramework()
+        return "[OK] Framework initialized with trained models!"
+    except Exception as e:
+        return f"[ERROR] Error initializing framework: {str(e)}"
+def process_query_sync(
+    query: str,
+    controller_type: str,
+):
+    """Synchronous wrapper for async processing."""
+    global framework
+    if framework is None:
+        framework = IntegratedFramework()
+    if not query.strip():
+        return ("Please enter a query.", {}, "", {}, "", "")
+    # Run async function
+    agent_result, controller_decision = asyncio.run(
+        framework.process_query(query=query, controller_type=controller_type.lower())
+    )
+    # Format outputs
+    final_response = agent_result.response
+    # Generate personality-infused response
+    personality_gen = PersonalityResponseGenerator()
+    try:
+        personality_response = personality_gen.generate_response(
+            agent_response=final_response,
+            query=query
+        )
+    except Exception as e:
+        # Fallback to a simple wrapper if personality generation fails
+        personality_response = f"Here's what I found:\n\n{final_response}"
+        print(f"Warning: Personality generation failed: {e}")
+    # Controller decision visualization
+    routing_viz = "### 🧠 Meta-Controller Decision\n\n"
+    routing_viz += f"**Selected Agent:** `{controller_decision.selected_agent.upper()}`\n\n"
+    routing_viz += f"**Confidence:** {controller_decision.confidence:.1%}\n\n"
+    routing_viz += "**Routing Probabilities:**\n"
+    for agent, prob in controller_decision.routing_probabilities.items():
+        bar = "█" * int(prob * 50)
+        routing_viz += f"- **{agent.upper()}**: {prob:.1%} {bar}\n"
+    # Agent details
+    agent_details = {
+        "agent": agent_result.agent_name,
+        "confidence": f"{agent_result.confidence:.1%}",
+        "reasoning_steps": agent_result.reasoning_steps,
+        "execution_time_ms": agent_result.execution_time_ms,
+    }
+    # Features used
+    features_viz = "### 📊 Features Used for Routing\n\n"
+    for feature, value in controller_decision.features_used.items():
+        if isinstance(value, float):
+            features_viz += f"- **{feature}**: {value:.3f}\n"
+        elif isinstance(value, bool):
+            features_viz += f"- **{feature}**: {'Yes' if value else 'No'}\n"
+        else:
+            features_viz += f"- **{feature}**: {value}\n"
+    # Metrics
+    metrics = f"""
+**Controller:** {controller_type}
+**Execution Time:** {agent_result.execution_time_ms:.2f} ms
+**Agent Confidence:** {agent_result.confidence:.1%}
+"""
+    return final_response, agent_details, routing_viz, features_viz, metrics, personality_response
+# Example queries
+EXAMPLE_QUERIES = [
+    "What are the key factors to consider when choosing between microservices and monolithic architecture?",
+    "How can we optimize a Python application that processes 10GB of log files daily?",
+    "Compare the performance characteristics of B-trees vs LSM-trees for write-heavy workloads",
+    "Design a distributed rate limiting system that handles 100k requests per second",
+    "Explain the difference between supervised and unsupervised learning with examples",
+]
+# Gradio Interface
+with gr.Blocks(
+    title="LangGraph Multi-Agent MCTS - Trained Models Demo",
+    theme=gr.themes.Soft(),
+    css="""
+    .agent-box { border: 1px solid #ddd; padding: 10px; border-radius: 5px; margin: 5px 0; }
+    .highlight { background-color: #e3f2fd; padding: 10px; border-radius: 5px; margin: 10px 0; }
+    """,
+) as demo:
+    gr.Markdown(
+        """
+        # 🎯 LangGraph Multi-Agent MCTS Framework
+        ## Production Demo with Trained Neural Meta-Controllers
+        This demo uses **REAL trained models**:
+        - 🧠 **RNN Meta-Controller**: GRU-based sequential pattern recognition
+        - 🤖 **BERT with LoRA**: Transformer-based text understanding for routing
+        The meta-controllers learn to route queries to the optimal agent:
+        - **HRM**: Hierarchical reasoning for complex decomposition
+        - **TRM**: Iterative refinement for progressive improvement
+        - **MCTS**: Strategic exploration for optimization problems
+        ---
+        """
+    )
+    with gr.Row():
+        with gr.Column(scale=2):
+            query_input = gr.Textbox(
+                label="Query", placeholder="Enter your question or reasoning task...", lines=4, max_lines=10
+            )
+            gr.Markdown("**Example Queries:**")
+            example_dropdown = gr.Dropdown(choices=EXAMPLE_QUERIES, label="Select an example", interactive=True)
+            def load_example(example):
+                return example
+            example_dropdown.change(load_example, example_dropdown, query_input)
+        with gr.Column(scale=1):
+            gr.Markdown("**Meta-Controller Selection**")
+            controller_type = gr.Radio(
+                choices=["RNN", "BERT"],
+                value="RNN",
+                label="Controller Type",
+                info="Choose which trained controller to use",
+            )
+            gr.Markdown(
+                """
+            **Controller Comparison:**
+            - **RNN**: Fast, captures sequential patterns
+            - **BERT**: More context-aware, text understanding
+            """
+            )
+    process_btn = gr.Button("🚀 Process Query", variant="primary", size="lg")
+    gr.Markdown("---")
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### 🎯 Agent Response")
+            final_response_output = gr.Textbox(label="Response", lines=4, interactive=False)
+            gr.Markdown("### 🤝 Personality-Infused Response")
+            gr.Markdown("*A conversational, balanced advisor interpretation*")
+            personality_output = gr.Textbox(label="Balanced Advisor Response", lines=8, interactive=False)
+            gr.Markdown("### 📈 Performance Metrics")
+            metrics_output = gr.Markdown()
+        with gr.Column():
+            routing_viz = gr.Markdown(label="Controller Decision")
+            features_viz = gr.Markdown(label="Features")
+    with gr.Accordion("🔍 Detailed Agent Information", open=False):
+        agent_details_output = gr.JSON(label="Agent Execution Details")
+    # Wire up the processing
+    process_btn.click(
+        fn=process_query_sync,
+        inputs=[
+            query_input,
+            controller_type,
+        ],
+        outputs=[final_response_output, agent_details_output, routing_viz, features_viz, metrics_output, personality_output],
+    )
+    gr.Markdown(
+        """
+        ---
+        ### 📚 About This Demo
+        This is a **production demonstration** of trained neural meta-controllers for multi-agent routing.
+        **Models:**
+        - RNN Meta-Controller: 10-dimensional feature vector → 3-class routing (HRM/TRM/MCTS)
+        - BERT with LoRA: Text features → routing decision with adapters
+        **Training:**
+        - Synthetic dataset: 1000+ samples with balanced routing decisions
+        - Optimization: Adam optimizer, cross-entropy loss
+        - Validation: 80/20 train/val split with early stopping
+        **Repository:** [GitHub - langgraph_multi_agent_mcts](https://github.com/ianshank/langgraph_multi_agent_mcts)
+        ---
+        *Built with PyTorch, Transformers, PEFT, and Gradio*
+        """
+    )
+if __name__ == "__main__":
+    # Initialize framework
+    print("Initializing framework with trained models...")
+    framework = IntegratedFramework()
+    # Launch the demo
+    demo.launch(server_name="0.0.0.0", share=False, show_error=True)

app_mock.py ADDED Viewed

	@@ -0,0 +1,590 @@

+"""
+LangGraph Multi-Agent MCTS Framework - Hugging Face Spaces Demo
+A proof-of-concept demonstration of multi-agent reasoning with Monte Carlo Tree Search.
+"""
+import asyncio
+import time
+from dataclasses import dataclass
+import gradio as gr
+import numpy as np
+# Demo-specific simplified implementations
+from demo_src.agents_demo import HRMAgent, TRMAgent
+from demo_src.llm_mock import HuggingFaceClient, MockLLMClient
+from demo_src.mcts_demo import MCTSDemo
+from demo_src.wandb_tracker import WandBTracker, is_wandb_available
+@dataclass
+class AgentResult:
+    """Result from a single agent."""
+    agent_name: str
+    response: str
+    confidence: float
+    reasoning_steps: list[str]
+    execution_time_ms: float
+@dataclass
+class FrameworkResult:
+    """Combined result from all agents."""
+    query: str
+    hrm_result: AgentResult | None
+    trm_result: AgentResult | None
+    mcts_result: dict | None
+    consensus_score: float
+    final_response: str
+    total_time_ms: float
+    metadata: dict
+class MultiAgentFrameworkDemo:
+    """Simplified multi-agent framework for Hugging Face Spaces demo."""
+    def __init__(self, use_hf_inference: bool = False, hf_model: str = ""):
+        """Initialize the demo framework.
+        Args:
+            use_hf_inference: Use Hugging Face Inference API instead of mock
+            hf_model: Hugging Face model ID for inference
+        """
+        self.use_hf_inference = use_hf_inference
+        self.hf_model = hf_model
+        # Initialize components
+        if use_hf_inference and hf_model:
+            self.llm_client = HuggingFaceClient(model_id=hf_model)
+        else:
+            self.llm_client = MockLLMClient()
+        self.hrm_agent = HRMAgent(self.llm_client)
+        self.trm_agent = TRMAgent(self.llm_client)
+        self.mcts = MCTSDemo()
+    async def process_query(
+        self,
+        query: str,
+        use_hrm: bool = True,
+        use_trm: bool = True,
+        use_mcts: bool = False,
+        mcts_iterations: int = 25,
+        exploration_weight: float = 1.414,
+        seed: int | None = None,
+    ) -> FrameworkResult:
+        """Process a query through the multi-agent framework.
+        Args:
+            query: The input query to process
+            use_hrm: Enable Hierarchical Reasoning Module
+            use_trm: Enable Tree Reasoning Module
+            use_mcts: Enable Monte Carlo Tree Search
+            mcts_iterations: Number of MCTS iterations
+            exploration_weight: UCB1 exploration parameter
+            seed: Random seed for reproducibility
+        Returns:
+            FrameworkResult with all agent outputs and consensus
+        """
+        start_time = time.perf_counter()
+        hrm_result = None
+        trm_result = None
+        mcts_result = None
+        # Run enabled agents
+        tasks = []
+        agent_names = []
+        if use_hrm:
+            tasks.append(self._run_hrm(query))
+            agent_names.append("hrm")
+        if use_trm:
+            tasks.append(self._run_trm(query))
+            agent_names.append("trm")
+        if use_mcts:
+            tasks.append(self._run_mcts(query, mcts_iterations, exploration_weight, seed))
+            agent_names.append("mcts")
+        # Execute agents concurrently
+        if tasks:
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+            for name, result in zip(agent_names, results, strict=False):
+                if isinstance(result, Exception):
+                    continue
+                if name == "hrm":
+                    hrm_result = result
+                elif name == "trm":
+                    trm_result = result
+                elif name == "mcts":
+                    mcts_result = result
+        # Calculate consensus score
+        consensus_score = self._calculate_consensus(hrm_result, trm_result, mcts_result)
+        # Generate final synthesized response
+        final_response = self._synthesize_response(query, hrm_result, trm_result, mcts_result, consensus_score)
+        total_time = (time.perf_counter() - start_time) * 1000
+        return FrameworkResult(
+            query=query,
+            hrm_result=hrm_result,
+            trm_result=trm_result,
+            mcts_result=mcts_result,
+            consensus_score=consensus_score,
+            final_response=final_response,
+            total_time_ms=round(total_time, 2),
+            metadata={
+                "agents_used": agent_names,
+                "mcts_config": (
+                    {"iterations": mcts_iterations, "exploration_weight": exploration_weight, "seed": seed}
+                    if use_mcts
+                    else None
+                ),
+            },
+        )
+    async def _run_hrm(self, query: str) -> AgentResult:
+        """Run Hierarchical Reasoning Module."""
+        start = time.perf_counter()
+        result = await self.hrm_agent.process(query)
+        elapsed = (time.perf_counter() - start) * 1000
+        return AgentResult(
+            agent_name="HRM (Hierarchical Reasoning)",
+            response=result["response"],
+            confidence=result["confidence"],
+            reasoning_steps=result["steps"],
+            execution_time_ms=round(elapsed, 2),
+        )
+    async def _run_trm(self, query: str) -> AgentResult:
+        """Run Tree Reasoning Module."""
+        start = time.perf_counter()
+        result = await self.trm_agent.process(query)
+        elapsed = (time.perf_counter() - start) * 1000
+        return AgentResult(
+            agent_name="TRM (Iterative Refinement)",
+            response=result["response"],
+            confidence=result["confidence"],
+            reasoning_steps=result["steps"],
+            execution_time_ms=round(elapsed, 2),
+        )
+    async def _run_mcts(self, query: str, iterations: int, exploration_weight: float, seed: int | None) -> dict:
+        """Run Monte Carlo Tree Search."""
+        start = time.perf_counter()
+        # MCTSDemo.search is now async and uses the production framework
+        result = await self.mcts.search(query=query, iterations=iterations, exploration_weight=exploration_weight, seed=seed)
+        elapsed = (time.perf_counter() - start) * 1000
+        result["execution_time_ms"] = round(elapsed, 2)
+        return result
+    def _calculate_consensus(
+        self, hrm_result: AgentResult | None, trm_result: AgentResult | None, mcts_result: dict | None
+    ) -> float:
+        """Calculate agreement score between agents."""
+        confidences = []
+        if hrm_result:
+            confidences.append(hrm_result.confidence)
+        if trm_result:
+            confidences.append(trm_result.confidence)
+        if mcts_result:
+            confidences.append(mcts_result.get("best_value", 0.5))
+        if not confidences:
+            return 0.0
+        # Consensus is based on confidence alignment and average
+        if len(confidences) == 1:
+            return confidences[0]
+        avg_confidence = np.mean(confidences)
+        std_confidence = np.std(confidences)
+        # Higher consensus when agents agree (low std) and are confident (high avg)
+        agreement_factor = max(0, 1 - std_confidence * 2)
+        consensus = avg_confidence * agreement_factor
+        return round(min(1.0, consensus), 3)
+    def _synthesize_response(
+        self,
+        query: str,
+        hrm_result: AgentResult | None,
+        trm_result: AgentResult | None,
+        mcts_result: dict | None,
+        consensus_score: float,
+    ) -> str:
+        """Synthesize final response from all agent outputs."""
+        parts = []
+        if hrm_result and hrm_result.confidence > 0.5:
+            parts.append(f"[HRM] {hrm_result.response}")
+        if trm_result and trm_result.confidence > 0.5:
+            parts.append(f"[TRM] {trm_result.response}")
+        if mcts_result and mcts_result.get("best_value", 0) > 0.5:
+            parts.append(f"[MCTS] Best path: {mcts_result.get('best_action', 'N/A')}")
+        if not parts:
+            truncated_query = f"{query[:80]}..." if len(query) > 80 else query
+            return f"Insufficient confidence to answer query: '{truncated_query}'."
+        synthesis = " | ".join(parts)
+        if consensus_score > 0.7:
+            return f"HIGH CONSENSUS ({consensus_score:.1%}): {synthesis}"
+        elif consensus_score > 0.4:
+            return f"MODERATE CONSENSUS ({consensus_score:.1%}): {synthesis}"
+        else:
+            return f"LOW CONSENSUS ({consensus_score:.1%}): {synthesis}"
+# Global framework instance
+framework = None
+wandb_tracker = None
+def initialize_framework(use_hf: bool, model_id: str):
+    """Initialize or reinitialize the framework."""
+    global framework
+    framework = MultiAgentFrameworkDemo(use_hf_inference=use_hf, hf_model=model_id)
+    return "Framework initialized successfully!"
+def process_query_sync(
+    query: str,
+    use_hrm: bool,
+    use_trm: bool,
+    use_mcts: bool,
+    mcts_iterations: int,
+    exploration_weight: float,
+    seed: int,
+    enable_wandb: bool = False,
+    wandb_project: str = "langgraph-mcts-demo",
+    wandb_run_name: str = "",
+):
+    """Synchronous wrapper for async processing."""
+    global framework, wandb_tracker
+    if framework is None:
+        framework = MultiAgentFrameworkDemo()
+    if not query.strip():
+        return "Please enter a query.", {}, "", {}, ""
+    # Handle seed
+    seed_value = seed if seed > 0 else None
+    # Initialize W&B tracking if enabled
+    wandb_url = ""
+    if enable_wandb and is_wandb_available():
+        if wandb_tracker is None:
+            wandb_tracker = WandBTracker(project_name=wandb_project, enabled=True)
+        # Start a new run
+        run_name = wandb_run_name if wandb_run_name.strip() else None
+        config = {
+            "query": query[:200],  # Truncate for config
+            "use_hrm": use_hrm,
+            "use_trm": use_trm,
+            "use_mcts": use_mcts,
+            "mcts_iterations": mcts_iterations,
+            "exploration_weight": exploration_weight,
+            "seed": seed_value,
+        }
+        wandb_tracker.init_run(run_name=run_name, config=config)
+    # Run async function
+    result = asyncio.run(
+        framework.process_query(
+            query=query,
+            use_hrm=use_hrm,
+            use_trm=use_trm,
+            use_mcts=use_mcts,
+            mcts_iterations=int(mcts_iterations),
+            exploration_weight=exploration_weight,
+            seed=seed_value,
+        )
+    )
+    # Format outputs
+    final_response = result.final_response
+    # Agent details
+    agent_details = {}
+    if result.hrm_result:
+        agent_details["HRM"] = {
+            "response": result.hrm_result.response,
+            "confidence": f"{result.hrm_result.confidence:.1%}",
+            "reasoning_steps": result.hrm_result.reasoning_steps,
+            "time_ms": result.hrm_result.execution_time_ms,
+        }
+        # Log to W&B
+        if enable_wandb and wandb_tracker:
+            wandb_tracker.log_agent_result(
+                "HRM",
+                result.hrm_result.response,
+                result.hrm_result.confidence,
+                result.hrm_result.execution_time_ms,
+                result.hrm_result.reasoning_steps,
+            )
+    if result.trm_result:
+        agent_details["TRM"] = {
+            "response": result.trm_result.response,
+            "confidence": f"{result.trm_result.confidence:.1%}",
+            "reasoning_steps": result.trm_result.reasoning_steps,
+            "time_ms": result.trm_result.execution_time_ms,
+        }
+        # Log to W&B
+        if enable_wandb and wandb_tracker:
+            wandb_tracker.log_agent_result(
+                "TRM",
+                result.trm_result.response,
+                result.trm_result.confidence,
+                result.trm_result.execution_time_ms,
+                result.trm_result.reasoning_steps,
+            )
+    if result.mcts_result:
+        agent_details["MCTS"] = result.mcts_result
+        # Log to W&B
+        if enable_wandb and wandb_tracker:
+            wandb_tracker.log_mcts_result(result.mcts_result)
+    # Log consensus and performance to W&B
+    if enable_wandb and wandb_tracker:
+        wandb_tracker.log_consensus(result.consensus_score, result.metadata["agents_used"], result.final_response)
+        wandb_tracker.log_performance(result.total_time_ms)
+        wandb_tracker.log_query_summary(query, use_hrm, use_trm, use_mcts, result.consensus_score, result.total_time_ms)
+        # Get run URL
+        wandb_url = wandb_tracker.get_run_url() or ""
+        # Finish the run
+        wandb_tracker.finish_run()
+    # Metrics
+    metrics = f"""
+**Consensus Score:** {result.consensus_score:.1%}
+**Total Processing Time:** {result.total_time_ms:.2f} ms
+**Agents Used:** {", ".join(result.metadata["agents_used"])}
+"""
+    if wandb_url:
+        metrics += f"\n**W&B Run:** [{wandb_url}]({wandb_url})"
+    # Full JSON result
+    full_result = {
+        "query": result.query,
+        "final_response": result.final_response,
+        "consensus_score": result.consensus_score,
+        "total_time_ms": result.total_time_ms,
+        "metadata": result.metadata,
+        "agent_details": agent_details,
+        "wandb_url": wandb_url,
+    }
+    return final_response, agent_details, metrics, full_result, wandb_url
+def visualize_mcts_tree(mcts_result: dict) -> str:
+    """Create ASCII visualization of MCTS tree."""
+    if not mcts_result or "tree_visualization" not in mcts_result:
+        return "No MCTS tree data available"
+    return mcts_result["tree_visualization"]
+# Example queries for demonstration
+EXAMPLE_QUERIES = [
+    "What are the key factors to consider when choosing between microservices and monolithic architecture?",
+    "How can we optimize a Python application that processes 10GB of log files daily?",
+    "What is the best approach to implement rate limiting in a distributed system?",
+    "Should we use SQL or NoSQL database for a social media application with 1M users?",
+    "How to design a fault-tolerant message queue system?",
+]
+# Gradio Interface
+with gr.Blocks(
+    title="LangGraph Multi-Agent MCTS Demo",
+    theme=gr.themes.Soft(),
+    css="""
+    .agent-box { border: 1px solid #ddd; padding: 10px; border-radius: 5px; margin: 5px 0; }
+    .consensus-high { color: #28a745; font-weight: bold; }
+    .consensus-medium { color: #ffc107; font-weight: bold; }
+    .consensus-low { color: #dc3545; font-weight: bold; }
+    """,
+) as demo:
+    gr.Markdown(
+        """
+        # LangGraph Multi-Agent MCTS Framework
+        **Proof-of-Concept Demo** - Multi-agent reasoning with Monte Carlo Tree Search
+        This demo showcases:
+        - **HRM**: Hierarchical Reasoning Module - breaks down complex queries
+        - **TRM**: Tree Reasoning Module - iterative refinement of responses
+        - **MCTS**: Monte Carlo Tree Search - strategic exploration of solution space
+        - **Consensus**: Agreement scoring between agents
+        ---
+        """
+    )
+    with gr.Row():
+        with gr.Column(scale=2):
+            query_input = gr.Textbox(
+                label="Query", placeholder="Enter your reasoning task or question...", lines=3, max_lines=10
+            )
+            gr.Markdown("**Example Queries:**")
+            example_dropdown = gr.Dropdown(choices=EXAMPLE_QUERIES, label="Select an example", interactive=True)
+            def load_example(example):
+                return example
+            example_dropdown.change(load_example, example_dropdown, query_input)
+        with gr.Column(scale=1):
+            gr.Markdown("**Agent Configuration**")
+            use_hrm = gr.Checkbox(label="Enable HRM (Hierarchical)", value=True)
+            use_trm = gr.Checkbox(label="Enable TRM (Iterative)", value=True)
+            use_mcts = gr.Checkbox(label="Enable MCTS", value=False)
+            gr.Markdown("**MCTS Parameters**")
+            mcts_iterations = gr.Slider(
+                minimum=10,
+                maximum=100,
+                value=25,
+                step=5,
+                label="Iterations",
+                info="More iterations = better search, but slower",
+            )
+            exploration_weight = gr.Slider(
+                minimum=0.1,
+                maximum=3.0,
+                value=1.414,
+                step=0.1,
+                label="Exploration Weight (C)",
+                info="Higher = more exploration, Lower = more exploitation",
+            )
+            seed_input = gr.Number(label="Random Seed (0 for random)", value=0, precision=0)
+    with gr.Accordion("Weights & Biases Tracking", open=False):
+        gr.Markdown(
+            """
+            **Experiment Tracking with W&B**
+            Track your experiments, visualize metrics, and compare runs.
+            Requires W&B API key set in Space secrets as `WANDB_API_KEY`.
+            """
+        )
+        with gr.Row():
+            enable_wandb = gr.Checkbox(
+                label="Enable W&B Tracking", value=False, info="Log metrics and results to Weights & Biases"
+            )
+            wandb_project = gr.Textbox(
+                label="Project Name", value="langgraph-mcts-demo", placeholder="Your W&B project name"
+            )
+            wandb_run_name = gr.Textbox(label="Run Name (optional)", value="", placeholder="Auto-generated if empty")
+        wandb_status = gr.Markdown(f"**W&B Status:** {'Available' if is_wandb_available() else 'Not installed'}")
+    process_btn = gr.Button("Process Query", variant="primary", size="lg")
+    gr.Markdown("---")
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### Final Response")
+            final_response_output = gr.Textbox(label="Synthesized Response", lines=4, interactive=False)
+            gr.Markdown("### Performance Metrics")
+            metrics_output = gr.Markdown()
+        with gr.Column():
+            gr.Markdown("### Agent Details")
+            agent_details_output = gr.JSON(label="Individual Agent Results")
+    with gr.Accordion("Full JSON Result", open=False):
+        full_result_output = gr.JSON(label="Complete Framework Output")
+    with gr.Accordion("W&B Run Details", open=False, visible=True):
+        wandb_url_output = gr.Textbox(
+            label="W&B Run URL", interactive=False, placeholder="Enable W&B tracking to see run URL here"
+        )
+    # Wire up the processing
+    process_btn.click(
+        fn=process_query_sync,
+        inputs=[
+            query_input,
+            use_hrm,
+            use_trm,
+            use_mcts,
+            mcts_iterations,
+            exploration_weight,
+            seed_input,
+            enable_wandb,
+            wandb_project,
+            wandb_run_name,
+        ],
+        outputs=[final_response_output, agent_details_output, metrics_output, full_result_output, wandb_url_output],
+    )
+    gr.Markdown(
+        """
+        ---
+        ### About This Demo
+        This is a **proof-of-concept** demonstration of the LangGraph Multi-Agent MCTS Framework.
+        **Features:**
+        - Multi-agent orchestration with consensus scoring
+        - Monte Carlo Tree Search for strategic reasoning
+        - Configurable exploration vs exploitation trade-offs
+        - Deterministic results with seeded randomness
+        - **Weights & Biases integration** for experiment tracking
+        **Limitations (POC):**
+        - Uses mock/simplified LLM responses (not production LLM)
+        - Limited to demonstration scenarios
+        - No persistent storage or RAG
+        - Simplified MCTS implementation
+        **Full Framework:** [GitHub Repository](https://github.com/ianshank/langgraph_multi_agent_mcts)
+        ---
+        *Built with LangGraph, Gradio, Weights & Biases, and Python*
+        """
+    )
+if __name__ == "__main__":
+    # Initialize with mock client for demo
+    framework = MultiAgentFrameworkDemo(use_hf_inference=False)
+    # Launch the demo
+    demo.launch(server_name="0.0.0.0", server_port=7860, share=False, show_error=True)

demo_src/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Demo source modules for Hugging Face Spaces

demo_src/agents_demo.py ADDED Viewed

	@@ -0,0 +1,234 @@

+"""
+Simplified agent implementations for Hugging Face Spaces demo.
+"""
+import asyncio
+from typing import Any
+class HRMAgent:
+    """Hierarchical Reasoning Module - breaks down complex queries."""
+    def __init__(self, llm_client):
+        """Initialize with an LLM client.
+        Args:
+            llm_client: LLM client (MockLLMClient or HuggingFaceClient)
+        """
+        self.llm_client = llm_client
+        self.name = "HRM (Hierarchical Reasoning)"
+    async def process(self, query: str) -> dict[str, Any]:
+        """Process query using hierarchical decomposition.
+        Args:
+            query: Input query to process
+        Returns:
+            Dictionary with response, confidence, and reasoning steps
+        """
+        # Step 1: Decompose the query
+        decomposition_steps = await self._decompose_query(query)
+        # Step 2: Analyze each component
+        analysis_results = await self._analyze_components(decomposition_steps)
+        # Step 3: Synthesize hierarchical response
+        llm_result = await self.llm_client.generate(
+            prompt=f"Hierarchical analysis of: {query}", context=f"Components: {', '.join(decomposition_steps)}"
+        )
+        # Compile reasoning steps
+        reasoning_steps = [
+            f"1. Query decomposition: Identified {len(decomposition_steps)} key components",
+            f"2. Component analysis: {analysis_results}",
+            "3. Hierarchical synthesis: Combined insights from all levels",
+            f"4. Confidence assessment: {llm_result['confidence']:.1%} based on component clarity",
+        ]
+        return {
+            "response": llm_result["response"],
+            "confidence": llm_result["confidence"],
+            "steps": reasoning_steps,
+            "components": decomposition_steps,
+            "tokens_used": llm_result.get("tokens_used", 0),
+        }
+    async def _decompose_query(self, query: str) -> list[str]:
+        """Decompose query into hierarchical components."""
+        # Simulate decomposition based on query structure
+        await asyncio.sleep(0.05)  # Simulate processing
+        # Simple heuristic decomposition
+        components = []
+        # Extract key phrases
+        query_lower = query.lower()
+        if "?" in query:
+            components.append("Question type: Analytical")
+        else:
+            components.append("Question type: Directive")
+        if "how" in query_lower:
+            components.append("Focus: Methodology/Process")
+        elif "what" in query_lower:
+            components.append("Focus: Definition/Identification")
+        elif "why" in query_lower:
+            components.append("Focus: Causation/Reasoning")
+        elif "should" in query_lower or "best" in query_lower:
+            components.append("Focus: Decision/Recommendation")
+        else:
+            components.append("Focus: General inquiry")
+        # Domain detection
+        if any(term in query_lower for term in ["database", "sql", "nosql", "storage"]):
+            components.append("Domain: Data Management")
+        elif any(term in query_lower for term in ["architecture", "design", "pattern"]):
+            components.append("Domain: System Architecture")
+        elif any(term in query_lower for term in ["performance", "optimization", "speed"]):
+            components.append("Domain: Performance Engineering")
+        elif any(term in query_lower for term in ["scale", "distributed", "cluster"]):
+            components.append("Domain: Distributed Systems")
+        else:
+            components.append("Domain: Software Engineering")
+        # Complexity assessment
+        word_count = len(query.split())
+        if word_count > 20:
+            components.append("Complexity: High (detailed query)")
+        elif word_count > 10:
+            components.append("Complexity: Medium")
+        else:
+            components.append("Complexity: Low (concise query)")
+        return components
+    async def _analyze_components(self, components: list[str]) -> str:
+        """Analyze the decomposed components."""
+        await asyncio.sleep(0.03)  # Simulate processing
+        # Generate analysis summary
+        analysis_parts = []
+        for component in components:
+            if "Focus:" in component:
+                focus = component.split(":")[1].strip()
+                analysis_parts.append(f"requires {focus.lower()} approach")
+            elif "Domain:" in component:
+                domain = component.split(":")[1].strip()
+                analysis_parts.append(f"applies to {domain}")
+            elif "Complexity:" in component:
+                complexity = component.split(":")[1].strip().split()[0]
+                analysis_parts.append(f"{complexity.lower()} complexity level")
+        return "; ".join(analysis_parts) if analysis_parts else "Standard analysis"
+class TRMAgent:
+    """Tree Reasoning Module - iterative refinement of responses."""
+    def __init__(self, llm_client):
+        """Initialize with an LLM client.
+        Args:
+            llm_client: LLM client (MockLLMClient or HuggingFaceClient)
+        """
+        self.llm_client = llm_client
+        self.name = "TRM (Iterative Refinement)"
+        self.max_iterations = 3
+    async def process(self, query: str) -> dict[str, Any]:
+        """Process query using iterative refinement.
+        Args:
+            query: Input query to process
+        Returns:
+            Dictionary with response, confidence, and reasoning steps
+        """
+        reasoning_steps = []
+        current_response = ""
+        current_confidence = 0.0
+        # Iterative refinement loop
+        for iteration in range(self.max_iterations):
+            step_num = iteration + 1
+            # Generate or refine response
+            if iteration == 0:
+                # Initial response
+                result = await self.llm_client.generate(prompt=query, context="")
+                current_response = result["response"]
+                current_confidence = result["confidence"]
+                reasoning_steps.append(
+                    f"Iteration {step_num}: Initial response generated (confidence: {current_confidence:.1%})"
+                )
+            else:
+                # Refinement iteration
+                refinement_result = await self._refine_response(query, current_response, iteration)
+                current_response = refinement_result["response"]
+                # Confidence typically improves with refinement
+                confidence_improvement = min(0.1, (1 - current_confidence) * 0.3)
+                current_confidence = min(0.95, current_confidence + confidence_improvement)
+                reasoning_steps.append(
+                    f"Iteration {step_num}: {refinement_result['refinement_type']} "
+                    f"(confidence: {current_confidence:.1%})"
+                )
+            # Check if confidence is high enough to stop
+            if current_confidence > 0.85:
+                reasoning_steps.append(f"Early termination: High confidence ({current_confidence:.1%}) achieved")
+                break
+        # Final reasoning step
+        reasoning_steps.append(f"Final: Response refined through {len(reasoning_steps)} iterations")
+        return {
+            "response": current_response,
+            "confidence": round(current_confidence, 3),
+            "steps": reasoning_steps,
+            "iterations_used": min(iteration + 1, self.max_iterations),
+            "refinement_history": reasoning_steps,
+        }
+    async def _refine_response(self, query: str, current_response: str, iteration: int) -> dict[str, Any]:
+        """Refine the current response."""
+        await asyncio.sleep(0.05)  # Simulate refinement processing
+        # Different refinement strategies based on iteration
+        refinement_strategies = [
+            ("Clarity enhancement", "improve clarity and precision"),
+            ("Detail expansion", "add technical depth and specifics"),
+            ("Validation check", "verify accuracy and completeness"),
+        ]
+        strategy_name, strategy_action = refinement_strategies[iteration % len(refinement_strategies)]
+        # Generate refined response
+        refinement_prompt = f"""
+        Original query: {query}
+        Current response: {current_response}
+        Refinement task: {strategy_action}
+        """
+        result = await self.llm_client.generate(
+            prompt=refinement_prompt, context=f"Refinement iteration {iteration + 1}"
+        )
+        # Enhance the response based on strategy
+        enhanced_response = current_response
+        if strategy_name == "Clarity enhancement":
+            enhanced_response = f"{current_response}. {result['response']}"
+        elif strategy_name == "Detail expansion":
+            enhanced_response = f"{current_response}. Furthermore, {result['response']}"
+        else:  # Validation
+            enhanced_response = f"{current_response}. Validated: {result['response']}"
+        # Truncate if too long
+        if len(enhanced_response) > 300:
+            enhanced_response = enhanced_response[:297] + "..."
+        return {"response": enhanced_response, "refinement_type": strategy_name, "strategy_action": strategy_action}

demo_src/llm_mock.py ADDED Viewed

	@@ -0,0 +1,182 @@

+"""
+Mock and lightweight LLM clients for demo purposes.
+"""
+import asyncio
+import random
+from typing import Any
+class MockLLMClient:
+    """Mock LLM client that generates plausible demo responses."""
+    def __init__(self):
+        self.response_templates = {
+            "architecture": [
+                "Consider scalability requirements and team expertise",
+                "Evaluate coupling, deployment complexity, and operational overhead",
+                "Balance between development speed and long-term maintainability",
+            ],
+            "optimization": [
+                "Profile first to identify actual bottlenecks",
+                "Consider memory-mapped files and streaming processing",
+                "Implement parallel processing with appropriate chunk sizes",
+            ],
+            "database": [
+                "Consider data consistency requirements and query patterns",
+                "Evaluate write-heavy vs read-heavy workload characteristics",
+                "Plan for horizontal scaling and data distribution strategies",
+            ],
+            "distributed": [
+                "Implement proper failure detection and recovery mechanisms",
+                "Use circuit breakers and bulkhead patterns for resilience",
+                "Consider eventual consistency vs strong consistency trade-offs",
+            ],
+            "default": [
+                "Break down the problem into smaller components",
+                "Consider trade-offs between different approaches",
+                "Evaluate based on specific use case requirements",
+            ],
+        }
+    async def generate(self, prompt: str, context: str = "") -> dict[str, Any]:
+        """Generate a mock response based on the prompt and optional context."""
+        # Simulate processing time
+        await asyncio.sleep(random.uniform(0.1, 0.3))
+        # Determine response category
+        prompt_lower = prompt.lower()
+        if "architecture" in prompt_lower or "microservice" in prompt_lower or "monolith" in prompt_lower:
+            category = "architecture"
+        elif "optim" in prompt_lower or "performance" in prompt_lower or "process" in prompt_lower:
+            category = "optimization"
+        elif "database" in prompt_lower or "sql" in prompt_lower or "nosql" in prompt_lower:
+            category = "database"
+        elif "distribut" in prompt_lower or "fault" in prompt_lower or "rate limit" in prompt_lower:
+            category = "distributed"
+        else:
+            category = "default"
+        templates = self.response_templates[category]
+        # Generate response with some randomness
+        response = random.choice(templates)
+        confidence = random.uniform(0.6, 0.95)
+        # Add more detail based on prompt length (simulating "understanding")
+        if len(prompt) > 100:
+            confidence = min(0.95, confidence + 0.1)
+            response += f". Additionally, {random.choice(self.response_templates['default'])}"
+        # Lightly incorporate context to simulate conditioning
+        context_snippet = context.strip()
+        if context_snippet:
+            confidence = min(0.99, confidence + 0.05)
+            response += f" (context signal: {context_snippet[:60]}{'...' if len(context_snippet) > 60 else ''})"
+        return {
+            "response": response,
+            "confidence": round(confidence, 3),
+            "tokens_used": len(prompt.split()) * 2 + len(response.split()),
+        }
+    async def generate_reasoning_steps(self, query: str, num_steps: int = 3) -> list[str]:
+        """Generate mock reasoning steps."""
+        await asyncio.sleep(random.uniform(0.05, 0.15))
+        base_steps = [
+            f"Analyzing query: '{query[:50]}...'",
+            "Identifying key requirements and constraints",
+            "Evaluating potential approaches",
+            "Considering trade-offs and implications",
+            "Synthesizing recommendations based on analysis",
+            "Validating conclusions against requirements",
+        ]
+        return random.sample(base_steps, min(num_steps, len(base_steps)))
+class HuggingFaceClient:
+    """Lightweight Hugging Face Inference API client."""
+    def __init__(self, model_id: str = "mistralai/Mistral-7B-Instruct-v0.2"):
+        """Initialize with a Hugging Face model.
+        Args:
+            model_id: The model ID on Hugging Face Hub
+        """
+        self.model_id = model_id
+        self._client = None
+    def _get_client(self):
+        """Lazy load the HF client."""
+        if self._client is None:
+            try:
+                from huggingface_hub import InferenceClient
+                self._client = InferenceClient(model=self.model_id)
+            except ImportError:
+                raise ImportError("huggingface_hub not installed. Install with: pip install huggingface_hub")
+        return self._client
+    async def generate(self, prompt: str, context: str = "") -> dict[str, Any]:
+        """Generate response using Hugging Face Inference API."""
+        try:
+            client = self._get_client()
+            # Format prompt
+            if context:
+                full_prompt = f"Context: {context}\n\nQuestion: {prompt}\n\nAnswer:"
+            else:
+                full_prompt = f"Question: {prompt}\n\nProvide a concise, technical answer:\n\nAnswer:"
+            # Call HF Inference API (sync call wrapped in async)
+            response_text = await asyncio.to_thread(
+                client.text_generation, full_prompt, max_new_tokens=150, temperature=0.7, do_sample=True
+            )
+            # Estimate confidence based on response characteristics
+            confidence = min(0.95, 0.6 + len(response_text) / 500)
+            return {
+                "response": response_text.strip(),
+                "confidence": round(confidence, 3),
+                "tokens_used": len(full_prompt.split()) + len(response_text.split()),
+            }
+        except Exception as e:
+            # Fallback to mock on error
+            print(f"HF Inference error: {e}. Falling back to mock.")
+            mock = MockLLMClient()
+            return await mock.generate(prompt, context)
+    async def generate_reasoning_steps(self, query: str, num_steps: int = 3) -> list[str]:
+        """Generate reasoning steps using HF model."""
+        try:
+            client = self._get_client()
+            prompt = f"""Break down this question into {num_steps} reasoning steps:
+Question: {query}
+Reasoning steps (one per line):
+1."""
+            response = await asyncio.to_thread(client.text_generation, prompt, max_new_tokens=200, temperature=0.5)
+            # Parse steps from response
+            lines = response.strip().split("\n")
+            steps = []
+            for line in lines:
+                line = line.strip()
+                if line and not line.startswith("#"):
+                    # Remove numbering
+                    if line[0].isdigit() and "." in line[:3]:
+                        line = line.split(".", 1)[1].strip()
+                    steps.append(line)
+            return steps[:num_steps] if steps else ["Analysis in progress"]
+        except Exception as e:
+            print(f"HF reasoning error: {e}. Falling back to mock.")
+            mock = MockLLMClient()
+            return await mock.generate_reasoning_steps(query, num_steps)

demo_src/mcts_demo.py ADDED Viewed

	@@ -0,0 +1,436 @@

+"""
+Educational MCTS demonstration using the production framework.
+This demo uses the real MCTSEngine from src.framework.mcts.core to provide
+an authentic learning experience while remaining accessible for demonstrations.
+"""
+from __future__ import annotations
+import math
+from typing import Any
+from src.framework.mcts.core import MCTSEngine, MCTSNode, MCTSState
+from src.framework.mcts.policies import RolloutPolicy, SelectionPolicy
+class DemoRolloutPolicy(RolloutPolicy):
+    """
+    Educational rollout policy for demo purposes.
+    Evaluates states based on:
+    - Depth of exploration (deeper = more thorough)
+    - Action quality (domain-specific heuristics)
+    - Exploration randomness
+    """
+    def __init__(self, category: str, action_templates: dict[str, list[str]]):
+        """
+        Initialize demo rollout policy.
+        Args:
+            category: Query category for heuristic evaluation
+            action_templates: Available action templates for scoring
+        """
+        self.category = category
+        self.action_templates = action_templates
+        # Define key terms that indicate quality actions per category
+        self.quality_indicators = {
+            "architecture": ["scalability", "consistency", "requirements"],
+            "optimization": ["profile", "caching", "parallel"],
+            "database": ["patterns", "relationships", "scaling"],
+            "distributed": ["circuit", "retry", "bulkhead"],
+            "default": ["decompose", "constraints", "trade-offs"],
+        }
+    async def evaluate(
+        self,
+        state: MCTSState,
+        rng,
+        max_depth: int = 10,
+    ) -> float:
+        """
+        Evaluate a state through heuristic analysis.
+        This combines:
+        - Depth bonus: rewards thorough exploration
+        - Action quality: rewards domain-appropriate actions
+        - Noise: adds exploration randomness
+        Args:
+            state: State to evaluate
+            rng: Random number generator
+            max_depth: Maximum depth (unused in heuristic)
+        Returns:
+            Estimated value in [0, 1] range
+        """
+        # Base value
+        base_value = 0.5
+        # Depth bonus: deeper exploration = more value (up to 0.3)
+        depth = state.features.get("depth", 0)
+        depth_bonus = min(depth * 0.1, 0.3)
+        # Action quality bonus
+        action_bonus = 0.0
+        last_action = state.features.get("last_action", "")
+        if last_action:
+            # Check if action contains quality indicators for this category
+            indicators = self.quality_indicators.get(self.category, self.quality_indicators["default"])
+            for term in indicators:
+                if term in last_action.lower():
+                    action_bonus = 0.15
+                    break
+        # Add exploration noise
+        noise = rng.uniform(-0.1, 0.1)
+        # Combine components
+        value = base_value + depth_bonus + action_bonus + noise
+        # Clamp to [0, 1]
+        return max(0.0, min(1.0, value))
+class MCTSDemo:
+    """
+    Educational MCTS demonstration using the production framework.
+    This class wraps the production MCTSEngine to provide:
+    - Simple, educational interface for demos
+    - Category-based action selection
+    - Tree visualization for learning
+    - Deterministic behavior with seeds
+    Unlike the old mock implementation, this uses the real MCTS algorithm
+    with all its features: UCB1 selection, progressive widening, caching, etc.
+    """
+    def __init__(self, max_depth: int = 5):
+        """
+        Initialize MCTS demo.
+        Args:
+            max_depth: Maximum tree depth for exploration
+        """
+        self.max_depth = max_depth
+        # Action templates for different query types
+        # These provide domain-specific reasoning paths
+        self.action_templates = {
+            "architecture": [
+                "Consider microservices for scalability",
+                "Evaluate monolith for simplicity",
+                "Analyze team capabilities",
+                "Assess deployment requirements",
+                "Review data consistency needs",
+            ],
+            "optimization": [
+                "Profile application hotspots",
+                "Implement caching layer",
+                "Use parallel processing",
+                "Optimize database queries",
+                "Reduce memory allocations",
+            ],
+            "database": [
+                "Analyze query patterns",
+                "Consider data relationships",
+                "Evaluate consistency requirements",
+                "Plan for horizontal scaling",
+                "Assess read/write ratios",
+            ],
+            "distributed": [
+                "Implement circuit breakers",
+                "Add retry mechanisms",
+                "Use message queues",
+                "Apply bulkhead pattern",
+                "Design for eventual consistency",
+            ],
+            "default": [
+                "Decompose the problem",
+                "Identify constraints",
+                "Evaluate trade-offs",
+                "Consider alternatives",
+                "Validate assumptions",
+            ],
+        }
+    def _categorize_query(self, query: str) -> str:
+        """
+        Categorize query to select appropriate action templates.
+        Args:
+            query: User's input query
+        Returns:
+            Category name for action selection
+        """
+        query_lower = query.lower()
+        if "architecture" in query_lower or "microservice" in query_lower:
+            return "architecture"
+        elif "optim" in query_lower or "performance" in query_lower:
+            return "optimization"
+        elif "database" in query_lower or "sql" in query_lower:
+            return "database"
+        elif "distribut" in query_lower or "fault" in query_lower:
+            return "distributed"
+        return "default"
+    def _create_action_generator(self, category: str):
+        """
+        Create action generator function for this query category.
+        Args:
+            category: Query category
+        Returns:
+            Function that generates actions for a given state
+        """
+        def action_generator(state: MCTSState) -> list[str]:
+            """Generate available actions from current state."""
+            # Get category-specific actions
+            actions = self.action_templates.get(category, self.action_templates["default"])
+            # Filter out already-used actions (track via state features)
+            used_actions = state.features.get("used_actions", set())
+            available = [a for a in actions if a not in used_actions]
+            # If all actions used, allow re-exploring top 2
+            if not available:
+                return actions[:2]
+            return available
+        return action_generator
+    def _create_state_transition(self, category: str):
+        """
+        Create state transition function for this query category.
+        Args:
+            category: Query category
+        Returns:
+            Function that computes next state from current state + action
+        """
+        def state_transition(state: MCTSState, action: str) -> MCTSState:
+            """Compute next state by applying action."""
+            # Track action history
+            action_history = list(state.features.get("action_history", []))
+            action_history.append(action)
+            # Track used actions
+            used_actions = set(state.features.get("used_actions", set()))
+            used_actions.add(action)
+            # Increment depth
+            depth = state.features.get("depth", 0) + 1
+            # Create new state ID from action history
+            state_id = " -> ".join(action_history)
+            # Build new state
+            new_state = MCTSState(
+                state_id=state_id,
+                features={
+                    "action_history": action_history,
+                    "used_actions": used_actions,
+                    "depth": depth,
+                    "last_action": action,
+                    "category": category,
+                },
+            )
+            return new_state
+        return state_transition
+    def _generate_tree_visualization(self, root: MCTSNode, max_nodes: int = 20) -> str:
+        """
+        Generate ASCII visualization of the MCTS tree.
+        This provides educational insight into the search process.
+        Args:
+            root: Root node of the tree
+            max_nodes: Maximum nodes to display
+        Returns:
+            ASCII art representation of the tree
+        """
+        max_nodes = max(1, max_nodes)
+        lines = []
+        lines.append("MCTS Tree Visualization")
+        lines.append("=" * 50)
+        nodes_rendered = 0
+        def format_node(node: MCTSNode, prefix: str = "", is_last: bool = True) -> list[str]:
+            nonlocal nodes_rendered
+            result = []
+            # Node representation
+            connector = "└── " if is_last else "├── "
+            if nodes_rendered >= max_nodes:
+                result.append(f"{prefix}{connector}... (truncated)")
+                return result
+            nodes_rendered += 1
+            # Display action or state
+            node_str = f"{node.state.state_id[:30]}..."
+            if node.action:
+                node_str = f"{node.action[:25]}..."
+            stats = f"[V:{node.visits}, Q:{node.value:.3f}]"
+            result.append(f"{prefix}{connector}{node_str} {stats}")
+            # Recursively add children
+            new_prefix = prefix + ("    " if is_last else "│   ")
+            # Limit children shown
+            children_to_show = node.children[:3]
+            for i, child in enumerate(children_to_show):
+                is_child_last = i == len(children_to_show) - 1
+                result.extend(format_node(child, new_prefix, is_child_last))
+            if len(node.children) > 3:
+                result.append(f"{new_prefix}    ... and {len(node.children) - 3} more")
+            return result
+        # Start with root
+        lines.append(f"Root: {root.state.state_id[:40]}... [V:{root.visits}, Q:{root.value:.3f}]")
+        nodes_rendered += 1
+        for i, child in enumerate(root.children[:5]):
+            is_last = i == len(root.children[:5]) - 1
+            lines.extend(format_node(child, "", is_last))
+        if len(root.children) > 5:
+            lines.append(f"... and {len(root.children) - 5} more branches")
+        return "\n".join(lines)
+    async def search(
+        self,
+        query: str,
+        iterations: int = 25,
+        exploration_weight: float = 1.414,
+        seed: int | None = None,
+    ) -> dict[str, Any]:
+        """
+        Run MCTS search on the query using the production framework.
+        This method demonstrates the full MCTS algorithm:
+        1. Selection: UCB1-based tree traversal
+        2. Expansion: Progressive widening of nodes
+        3. Simulation: Heuristic evaluation (rollout)
+        4. Backpropagation: Value updates up the tree
+        Args:
+            query: The input query to analyze
+            iterations: Number of MCTS iterations (more = better but slower)
+            exploration_weight: UCB1 exploration constant (higher = more exploration)
+            seed: Random seed for deterministic results
+        Returns:
+            Dictionary with:
+            - best_action: Recommended next step
+            - best_value: Confidence in recommendation
+            - statistics: Search metrics and performance data
+            - tree_visualization: ASCII art of search tree
+        """
+        # Determine query category
+        category = self._categorize_query(query)
+        # Initialize MCTS engine with production features
+        engine = MCTSEngine(
+            seed=seed if seed is not None else 42,
+            exploration_weight=exploration_weight,
+            progressive_widening_k=1.0,  # Moderate expansion
+            progressive_widening_alpha=0.5,
+            max_parallel_rollouts=4,
+            cache_size_limit=10000,
+        )
+        # Create root state
+        root_state = MCTSState(
+            state_id=f"Query: {query[:50]}",
+            features={
+                "query": query,
+                "category": category,
+                "action_history": [],
+                "used_actions": set(),
+                "depth": 0,
+                "last_action": "",
+            },
+        )
+        # Create root node
+        root = MCTSNode(state=root_state, rng=engine.rng)
+        # Create domain-specific functions
+        action_generator = self._create_action_generator(category)
+        state_transition = self._create_state_transition(category)
+        rollout_policy = DemoRolloutPolicy(category, self.action_templates)
+        # Run MCTS search with production engine
+        best_action, stats = await engine.search(
+            root=root,
+            num_iterations=iterations,
+            action_generator=action_generator,
+            state_transition=state_transition,
+            rollout_policy=rollout_policy,
+            max_rollout_depth=self.max_depth,
+            selection_policy=SelectionPolicy.MAX_VISITS,  # Most robust
+        )
+        # Extract best child info
+        best_child = None
+        if root.children:
+            best_child = max(root.children, key=lambda c: c.visits)
+        # Compile results for demo interface
+        result = {
+            "best_action": best_action or "No action found",
+            "best_value": round(best_child.value, 4) if best_child else 0.0,
+            "root_visits": root.visits,
+            "total_nodes": engine.get_cached_node_count(),
+            "max_depth_reached": engine.get_cached_tree_depth(),
+            "iterations_completed": iterations,
+            "exploration_weight": exploration_weight,
+            "seed": seed,
+            "category": category,
+            # Top actions sorted by visits
+            "top_actions": [
+                {
+                    "action": child.action,
+                    "visits": child.visits,
+                    "value": round(child.value, 4),
+                    "ucb1": round(
+                        child.visits / root.visits if root.visits > 0 else 0.0, 4
+                    ),  # Simplified UCB display
+                }
+                for child in sorted(root.children, key=lambda c: -c.visits)[:5]
+            ],
+            # Framework statistics
+            "framework_stats": {
+                "cache_hits": stats.get("cache_hits", 0),
+                "cache_misses": stats.get("cache_misses", 0),
+                "cache_hit_rate": round(stats.get("cache_hit_rate", 0.0), 4),
+                "total_simulations": stats.get("total_simulations", 0),
+            },
+            # Educational visualization
+            "tree_visualization": self._generate_tree_visualization(root),
+        }
+        return result

demo_src/wandb_tracker.py ADDED Viewed

	@@ -0,0 +1,349 @@

+"""
+Weights & Biases integration for experiment tracking.
+"""
+import os
+from datetime import datetime
+from typing import Any
+try:
+    import wandb
+    WANDB_AVAILABLE = True
+except ImportError:
+    WANDB_AVAILABLE = False
+    wandb = None
+class WandBTracker:
+    """Weights & Biases experiment tracker for multi-agent MCTS demo."""
+    def __init__(self, project_name: str = "langgraph-mcts-demo", entity: str | None = None, enabled: bool = True):
+        """Initialize W&B tracker.
+        Args:
+            project_name: W&B project name
+            entity: W&B entity (username or team)
+            enabled: Whether tracking is enabled
+        """
+        self.project_name = project_name
+        self.entity = entity
+        self.enabled = enabled and WANDB_AVAILABLE
+        self.run = None
+        self.run_id = None
+    def is_available(self) -> bool:
+        """Check if W&B is available."""
+        return WANDB_AVAILABLE
+    def init_run(
+        self, run_name: str | None = None, config: dict[str, Any] | None = None, tags: list[str] | None = None
+    ) -> bool:
+        """Initialize a new W&B run.
+        Args:
+            run_name: Optional name for the run
+            config: Configuration dictionary to log
+            tags: Tags for the run
+        Returns:
+            True if run initialized successfully, False otherwise
+        """
+        if not self.enabled:
+            return False
+        try:
+            # Generate run name if not provided
+            if run_name is None:
+                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+                run_name = f"mcts_query_{timestamp}"
+            # Default tags
+            if tags is None:
+                tags = ["demo", "multi-agent", "mcts"]
+            # Initialize run
+            self.run = wandb.init(
+                project=self.project_name,
+                entity=self.entity,
+                name=run_name,
+                config=config or {},
+                tags=tags,
+                reinit=True,
+            )
+            self.run_id = self.run.id
+            return True
+        except Exception as e:
+            print(f"W&B init error: {e}")
+            self.enabled = False
+            return False
+    def log_query_config(self, config: dict[str, Any]):
+        """Log query configuration.
+        Args:
+            config: Configuration dictionary with agent settings, MCTS params, etc.
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            wandb.config.update(config)
+        except Exception as e:
+            print(f"W&B config log error: {e}")
+    def log_agent_result(
+        self,
+        agent_name: str,
+        response: str,
+        confidence: float,
+        execution_time_ms: float,
+        reasoning_steps: list[str] | None = None,
+    ):
+        """Log individual agent results.
+        Args:
+            agent_name: Name of the agent (HRM, TRM, MCTS)
+            response: Agent's response text
+            confidence: Confidence score (0-1)
+            execution_time_ms: Execution time in milliseconds
+            reasoning_steps: Optional list of reasoning steps
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            metrics = {
+                f"{agent_name}/confidence": confidence,
+                f"{agent_name}/execution_time_ms": execution_time_ms,
+                f"{agent_name}/response_length": len(response),
+            }
+            if reasoning_steps:
+                metrics[f"{agent_name}/num_reasoning_steps"] = len(reasoning_steps)
+            wandb.log(metrics)
+            # Log response as text
+            wandb.log({f"{agent_name}/response": wandb.Html(f"<pre>{response}</pre>")})
+        except Exception as e:
+            print(f"W&B agent result log error: {e}")
+    def log_mcts_result(self, mcts_result: dict[str, Any]):
+        """Log MCTS-specific metrics.
+        Args:
+            mcts_result: Dictionary containing MCTS search results
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            # Extract key metrics
+            metrics = {
+                "mcts/best_value": mcts_result.get("best_value", 0),
+                "mcts/root_visits": mcts_result.get("root_visits", 0),
+                "mcts/total_nodes": mcts_result.get("total_nodes", 0),
+                "mcts/max_depth": mcts_result.get("max_depth_reached", 0),
+                "mcts/iterations": mcts_result.get("iterations_completed", 0),
+                "mcts/exploration_weight": mcts_result.get("exploration_weight", 1.414),
+            }
+            wandb.log(metrics)
+            # Log top actions as table
+            if "top_actions" in mcts_result:
+                top_actions_data = []
+                for action in mcts_result["top_actions"]:
+                    top_actions_data.append(
+                        [
+                            action.get("action", ""),
+                            action.get("visits", 0),
+                            action.get("value", 0),
+                            action.get("ucb1", 0),
+                        ]
+                    )
+                if top_actions_data:
+                    table = wandb.Table(data=top_actions_data, columns=["Action", "Visits", "Value", "UCB1"])
+                    wandb.log({"mcts/top_actions_table": table})
+            # Log tree visualization as text artifact
+            if "tree_visualization" in mcts_result:
+                wandb.log({"mcts/tree_visualization": wandb.Html(f"<pre>{mcts_result['tree_visualization']}</pre>")})
+        except Exception as e:
+            print(f"W&B MCTS result log error: {e}")
+    def log_consensus(self, consensus_score: float, agents_used: list[str], final_response: str):
+        """Log consensus metrics.
+        Args:
+            consensus_score: Agreement score between agents (0-1)
+            agents_used: List of agent names that were used
+            final_response: Final synthesized response
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            wandb.log(
+                {
+                    "consensus/score": consensus_score,
+                    "consensus/num_agents": len(agents_used),
+                    "consensus/agents": ", ".join(agents_used),
+                    "consensus/final_response_length": len(final_response),
+                }
+            )
+            # Categorize consensus level
+            if consensus_score > 0.7:
+                consensus_level = "high"
+            elif consensus_score > 0.4:
+                consensus_level = "medium"
+            else:
+                consensus_level = "low"
+            wandb.log({"consensus/level": consensus_level})
+        except Exception as e:
+            print(f"W&B consensus log error: {e}")
+    def log_performance(self, total_time_ms: float):
+        """Log overall performance metrics.
+        Args:
+            total_time_ms: Total execution time in milliseconds
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            wandb.log({"performance/total_time_ms": total_time_ms, "performance/total_time_s": total_time_ms / 1000})
+        except Exception as e:
+            print(f"W&B performance log error: {e}")
+    def log_full_result(self, result: dict[str, Any]):
+        """Log the complete result as an artifact.
+        Args:
+            result: Full framework result dictionary
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            # Create artifact
+            artifact = wandb.Artifact(name=f"query_result_{self.run_id}", type="result")
+            # Add result as JSON
+            import json
+            import tempfile
+            with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
+                json.dump(result, f, indent=2, default=str)
+                temp_path = f.name
+            artifact.add_file(temp_path, name="result.json")
+            wandb.log_artifact(artifact)
+            # Clean up temp file
+            os.unlink(temp_path)
+        except Exception as e:
+            print(f"W&B full result log error: {e}")
+    def log_query_summary(
+        self, query: str, use_hrm: bool, use_trm: bool, use_mcts: bool, consensus_score: float, total_time_ms: float
+    ):
+        """Log a summary row for the query.
+        Args:
+            query: The input query
+            use_hrm: Whether HRM was enabled
+            use_trm: Whether TRM was enabled
+            use_mcts: Whether MCTS was enabled
+            consensus_score: Final consensus score
+            total_time_ms: Total execution time
+        """
+        if not self.enabled or not self.run:
+            return
+        try:
+            # Create summary table entry
+            summary_data = [
+                [
+                    query[:100] + "..." if len(query) > 100 else query,
+                    "✓" if use_hrm else "✗",
+                    "✓" if use_trm else "✗",
+                    "✓" if use_mcts else "✗",
+                    f"{consensus_score:.1%}",
+                    f"{total_time_ms:.2f}",
+                ]
+            ]
+            table = wandb.Table(data=summary_data, columns=["Query", "HRM", "TRM", "MCTS", "Consensus", "Time (ms)"])
+            wandb.log({"query_summary": table})
+        except Exception as e:
+            print(f"W&B summary log error: {e}")
+    def finish_run(self):
+        """Finish the current W&B run."""
+        if not self.enabled or not self.run:
+            return
+        try:
+            wandb.finish()
+            self.run = None
+            self.run_id = None
+        except Exception as e:
+            print(f"W&B finish error: {e}")
+    def get_run_url(self) -> str | None:
+        """Get the URL for the current run.
+        Returns:
+            URL string or None if no active run
+        """
+        if not self.enabled or not self.run:
+            return None
+        try:
+            return self.run.get_url()
+        except Exception:
+            return None
+# Global tracker instance
+_global_tracker: WandBTracker | None = None
+def get_tracker(
+    project_name: str = "langgraph-mcts-demo", entity: str | None = None, enabled: bool = True
+) -> WandBTracker:
+    """Get or create the global W&B tracker.
+    Args:
+        project_name: W&B project name
+        entity: W&B entity
+        enabled: Whether tracking is enabled
+    Returns:
+        WandBTracker instance
+    """
+    global _global_tracker
+    if _global_tracker is None:
+        _global_tracker = WandBTracker(project_name=project_name, entity=entity, enabled=enabled)
+    return _global_tracker
+def is_wandb_available() -> bool:
+    """Check if W&B is available."""
+    return WANDB_AVAILABLE

models/bert_lora/final_model/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: prajjwal1/bert-mini
+library_name: peft
+tags:
+- base_model:adapter:prajjwal1/bert-mini
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

models/bert_lora/final_model/adapter_config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "prajjwal1/bert-mini",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 4,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value"
+  ],
+  "target_parameters": null,
+  "task_type": "SEQ_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

models/bert_lora/final_model/adapter_model.safetensors ADDED Viewed

Binary file (71 kB). View file

models/bert_lora/generated_dataset.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/bert_lora/training_results.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "config": {
+    "model_name": "prajjwal1/bert-mini",
+    "lora_r": 4,
+    "lora_alpha": 16,
+    "lora_dropout": 0.1,
+    "lr": 0.001,
+    "batch_size": 16,
+    "epochs": 5,
+    "warmup_steps": 100,
+    "seed": 42,
+    "num_samples": 1000,
+    "data_path": null,
+    "balanced": true,
+    "output_dir": "models/bert_lora"
+  },
+  "train_history": {
+    "train_loss": 1.1033503922549162,
+    "train_runtime": 11.0946,
+    "train_samples_per_second": 315.018,
+    "epochs": 5,
+    "final_metrics": {
+      "train_runtime": 11.0946,
+      "train_samples_per_second": 315.018,
+      "train_steps_per_second": 19.829,
+      "total_flos": 34821822412800.0,
+      "train_loss": 1.1033503922549162,
+      "epoch": 5.0
+    },
+    "eval_results": {
+      "eval_loss": 1.0453400611877441,
+      "eval_accuracy": 0.47651006711409394,
+      "eval_runtime": 0.1251,
+      "eval_samples_per_second": 1191.171,
+      "eval_steps_per_second": 79.944,
+      "epoch": 5.0
+    }
+  },
+  "test_results": {
+    "loss": 1.0559743153338401,
+    "accuracy": 0.4768211920529801
+  },
+  "model_params": {
+    "total_params": 11188486,
+    "trainable_params": 17155,
+    "trainable_percentage": 0.15
+  }
+}

models/rnn_meta_controller.history.json ADDED Viewed

	@@ -0,0 +1,128 @@

+{
+  "config": {
+    "hidden_dim": 64,
+    "num_layers": 1,
+    "dropout": 0.1,
+    "lr": 0.001,
+    "batch_size": 32,
+    "epochs": 20,
+    "patience": 5,
+    "seed": 42,
+    "num_samples": 1000
+  },
+  "training_history": {
+    "train_losses": [
+      1.060307163180727,
+      0.9014069383794611,
+      0.6105747597687172,
+      0.35656250968123926,
+      0.22574858390020602,
+      0.16157509059165465,
+      0.12456387586214325,
+      0.10158110240643675,
+      0.08592396827809738,
+      0.07474524908783761,
+      0.06479036057311477,
+      0.057878461638183304,
+      0.052609961931452606,
+      0.04809149278497154,
+      0.043710527828697,
+      0.041286276738074695,
+      0.03756282673302022,
+      0.03491098284156936,
+      0.031911260236731985,
+      0.030496817025722878
+    ],
+    "val_losses": [
+      1.0059996803601583,
+      0.7808501919110616,
+      0.47826388080914817,
+      0.29279296696186063,
+      0.2008462185660998,
+      0.1529717780649662,
+      0.12299496456980705,
+      0.10291122049093246,
+      0.08860023791591326,
+      0.07790809428940217,
+      0.06982718824098508,
+      0.06387854401643077,
+      0.05984275036801894,
+      0.05463591649507483,
+      0.04938021237030625,
+      0.0452831008626769,
+      0.04252756762628754,
+      0.039516554485696055,
+      0.038632405494960644,
+      0.035608950459087886
+    ],
+    "val_accuracies": [
+      0.8466666666666667,
+      0.92,
+      0.9822222222222222,
+      0.9933333333333333,
+      0.9911111111111112,
+      0.9933333333333333,
+      0.9955555555555555,
+      0.9955555555555555,
+      0.9955555555555555,
+      0.9955555555555555,
+      0.9955555555555555,
+      0.9977777777777778,
+      0.9933333333333333,
+      0.9933333333333333,
+      0.9977777777777778,
+      0.9977777777777778,
+      0.9977777777777778,
+      0.9977777777777778,
+      0.9955555555555555,
+      0.9977777777777778
+    ],
+    "best_epoch": 20,
+    "best_val_loss": 0.035608950459087886,
+    "best_val_accuracy": 0.9977777777777778,
+    "stopped_early": false,
+    "total_epochs": 20
+  },
+  "test_results": {
+    "loss": 0.022989434589787076,
+    "accuracy": 0.9977777777777778,
+    "per_class_metrics": {
+      "hrm": {
+        "precision": 1.0,
+        "recall": 1.0,
+        "f1_score": 1.0,
+        "support": 153
+      },
+      "trm": {
+        "precision": 0.9933774834437086,
+        "recall": 1.0,
+        "f1_score": 0.9966777408637874,
+        "support": 150
+      },
+      "mcts": {
+        "precision": 1.0,
+        "recall": 0.9931972789115646,
+        "f1_score": 0.9965870307167235,
+        "support": 147
+      }
+    },
+    "confusion_matrix": [
+      [
+        153,
+        0,
+        0
+      ],
+      [
+        0,
+        150,
+        0
+      ],
+      [
+        0,
+        1,
+        146
+      ]
+    ],
+    "total_samples": 450
+  }
+}

models/rnn_meta_controller.pt ADDED Viewed

Binary file (61.6 kB). View file

requirements.txt ADDED Viewed

	@@ -0,0 +1,28 @@

+# LangGraph Multi-Agent MCTS Demo - Dependencies
+# Optimized for Hugging Face Spaces deployment with trained models
+# Core UI Framework
+gradio>=4.0.0,<5.0.0
+# Numerical computation
+numpy>=1.24.0,<2.0.0
+# Machine Learning - Neural Models
+torch>=2.1.0
+transformers>=4.40.0
+peft>=0.7.0
+sentence-transformers>=2.2.0
+# Configuration
+pyyaml>=6.0
+# Experiment Tracking
+wandb>=0.16.0
+# Required for Gradio OAuth and model loading
+huggingface_hub>=0.20.0,<0.30.0
+# Note: This demo now uses REAL trained models:
+# - RNN Meta-Controller (models/rnn_meta_controller.pt)
+# - BERT with LoRA adapters (models/bert_lora/final_model/)
+# - Actual HRM and TRM agent implementations

src/__init__.py ADDED Viewed

File without changes

src/adapters/__init__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""
+Adapters package for external service integrations.
+"""
+from .llm import BaseLLMClient, LLMResponse, create_client
+__all__ = ["create_client", "BaseLLMClient", "LLMResponse"]

src/adapters/llm/__init__.py ADDED Viewed

	@@ -0,0 +1,257 @@

+"""
+LLM Client Factory and Provider Registry.
+This module provides a factory function to instantiate the correct LLM client
+based on provider settings, with lazy loading of adapters.
+"""
+import importlib
+import logging
+from typing import Any
+from .base import BaseLLMClient, LLMClient, LLMResponse, LLMToolResponse, ToolCall
+from .exceptions import (
+    CircuitBreakerOpenError,
+    LLMAuthenticationError,
+    LLMClientError,
+    LLMConnectionError,
+    LLMContentFilterError,
+    LLMContextLengthError,
+    LLMInvalidRequestError,
+    LLMModelNotFoundError,
+    LLMQuotaExceededError,
+    LLMRateLimitError,
+    LLMResponseParseError,
+    LLMServerError,
+    LLMStreamError,
+    LLMTimeoutError,
+)
+logger = logging.getLogger(__name__)
+# Provider registry with lazy loading
+# Maps provider name to (module_path, class_name)
+_PROVIDER_REGISTRY: dict[str, tuple[str, str]] = {
+    "openai": ("src.adapters.llm.openai_client", "OpenAIClient"),
+    "anthropic": ("src.adapters.llm.anthropic_client", "AnthropicClient"),
+    "lmstudio": ("src.adapters.llm.lmstudio_client", "LMStudioClient"),
+    "local": ("src.adapters.llm.lmstudio_client", "LMStudioClient"),  # Alias
+}
+# Cache for loaded client classes
+_CLIENT_CACHE: dict[str, type[BaseLLMClient]] = {}
+def register_provider(name: str, module_path: str, class_name: str, override: bool = False) -> None:
+    """
+    Register a new LLM provider.
+    Args:
+        name: Provider identifier (e.g., "azure", "bedrock")
+        module_path: Full module path (e.g., "src.adapters.llm.azure_client")
+        class_name: Class name in the module (e.g., "AzureOpenAIClient")
+        override: If True, allow overriding existing provider
+    """
+    if name in _PROVIDER_REGISTRY and not override:
+        raise ValueError(f"Provider '{name}' already registered. Use override=True to replace.")
+    _PROVIDER_REGISTRY[name] = (module_path, class_name)
+    # Clear cache if overriding
+    if name in _CLIENT_CACHE:
+        del _CLIENT_CACHE[name]
+    logger.info(f"Registered LLM provider: {name} -> {module_path}.{class_name}")
+def list_providers() -> list[str]:
+    """
+    List all registered provider names.
+    Returns:
+        List of provider identifiers
+    """
+    return list(_PROVIDER_REGISTRY.keys())
+def get_provider_class(provider: str) -> type[BaseLLMClient]:
+    """
+    Get the client class for a provider (with lazy loading).
+    Args:
+        provider: Provider identifier
+    Returns:
+        Client class (not instantiated)
+    Raises:
+        ValueError: If provider not registered
+        ImportError: If module cannot be loaded
+    """
+    if provider not in _PROVIDER_REGISTRY:
+        available = ", ".join(list_providers())
+        raise ValueError(f"Unknown provider '{provider}'. Available: {available}")
+    # Check cache first
+    if provider in _CLIENT_CACHE:
+        return _CLIENT_CACHE[provider]
+    # Lazy load the module
+    module_path, class_name = _PROVIDER_REGISTRY[provider]
+    try:
+        module = importlib.import_module(module_path)
+        client_class = getattr(module, class_name)
+    except ImportError as e:
+        raise ImportError(f"Failed to load provider '{provider}': {e}") from e
+    except AttributeError as e:
+        raise ImportError(f"Class '{class_name}' not found in module '{module_path}'") from e
+    # Cache for future use
+    _CLIENT_CACHE[provider] = client_class
+    return client_class
+def create_client(
+    provider: str = "openai",
+    *,
+    api_key: str | None = None,
+    model: str | None = None,
+    base_url: str | None = None,
+    timeout: float | None = None,
+    max_retries: int | None = None,
+    **kwargs: Any,
+) -> BaseLLMClient:
+    """
+    Create an LLM client instance.
+    This is the main factory function for creating provider clients.
+    Args:
+        provider: Provider name ("openai", "anthropic", "lmstudio", etc.)
+        api_key: API key (may be optional for some providers)
+        model: Model identifier
+        base_url: Base URL for API
+        timeout: Request timeout in seconds
+        max_retries: Maximum retry attempts
+        **kwargs: Provider-specific parameters
+    Returns:
+        Configured LLMClient instance
+    Examples:
+        # OpenAI client
+        client = create_client("openai", model="gpt-4-turbo-preview")
+        # Anthropic client
+        client = create_client("anthropic", model="sonnet")
+        # Local LM Studio
+        client = create_client("lmstudio", base_url="http://localhost:1234/v1")
+        # With custom settings
+        client = create_client(
+            "openai",
+            api_key="sk-...",
+            timeout=120.0,
+            max_retries=5,
+            organization="org-..."
+        )
+    """
+    client_class = get_provider_class(provider)
+    # Build kwargs for client initialization
+    init_kwargs = {**kwargs}
+    if api_key is not None:
+        init_kwargs["api_key"] = api_key
+    if model is not None:
+        init_kwargs["model"] = model
+    if base_url is not None:
+        init_kwargs["base_url"] = base_url
+    if timeout is not None:
+        init_kwargs["timeout"] = timeout
+    if max_retries is not None:
+        init_kwargs["max_retries"] = max_retries
+    logger.info(f"Creating {provider} client with model={model or 'default'}")
+    return client_class(**init_kwargs)
+def create_client_from_config(config: dict) -> BaseLLMClient:
+    """
+    Create an LLM client from a configuration dictionary.
+    Useful for loading settings from YAML/JSON config files.
+    Args:
+        config: Configuration dictionary with keys:
+            - provider: Required provider name
+            - Other keys passed to create_client
+    Returns:
+        Configured LLMClient instance
+    Example:
+        config = {
+            "provider": "openai",
+            "model": "gpt-4-turbo-preview",
+            "timeout": 60.0,
+            "max_retries": 3
+        }
+        client = create_client_from_config(config)
+    """
+    config = config.copy()
+    provider = config.pop("provider", "openai")
+    return create_client(provider, **config)
+# Convenience aliases for common use cases
+def create_openai_client(**kwargs) -> BaseLLMClient:
+    """Create an OpenAI client."""
+    return create_client("openai", **kwargs)
+def create_anthropic_client(**kwargs) -> BaseLLMClient:
+    """Create an Anthropic Claude client."""
+    return create_client("anthropic", **kwargs)
+def create_local_client(**kwargs) -> BaseLLMClient:
+    """Create a local LM Studio client."""
+    return create_client("lmstudio", **kwargs)
+__all__ = [
+    # Base types
+    "LLMClient",
+    "LLMResponse",
+    "LLMToolResponse",
+    "ToolCall",
+    "BaseLLMClient",
+    # Exceptions
+    "LLMClientError",
+    "LLMAuthenticationError",
+    "LLMRateLimitError",
+    "LLMQuotaExceededError",
+    "LLMModelNotFoundError",
+    "LLMContextLengthError",
+    "LLMInvalidRequestError",
+    "LLMTimeoutError",
+    "LLMConnectionError",
+    "LLMServerError",
+    "LLMResponseParseError",
+    "LLMStreamError",
+    "LLMContentFilterError",
+    "CircuitBreakerOpenError",
+    # Factory functions
+    "create_client",
+    "create_client_from_config",
+    "create_openai_client",
+    "create_anthropic_client",
+    "create_local_client",
+    # Registry functions
+    "register_provider",
+    "list_providers",
+    "get_provider_class",
+]

src/adapters/llm/anthropic_client.py ADDED Viewed

	@@ -0,0 +1,521 @@

+"""
+Anthropic Claude LLM client adapter.
+Implements the LLMClient protocol for Anthropic's Messages API.
+Supports Claude 3 models with proper content block handling.
+"""
+import json
+import logging
+from collections.abc import AsyncIterator
+from typing import Any
+import httpx
+from tenacity import (
+    before_sleep_log,
+    retry,
+    retry_if_exception_type,
+    stop_after_attempt,
+    wait_exponential,
+)
+from .base import BaseLLMClient, LLMResponse, LLMToolResponse, ToolCall
+from .exceptions import (
+    CircuitBreakerOpenError,
+    LLMAuthenticationError,
+    LLMClientError,
+    LLMConnectionError,
+    LLMContentFilterError,
+    LLMContextLengthError,
+    LLMInvalidRequestError,
+    LLMModelNotFoundError,
+    LLMQuotaExceededError,
+    LLMRateLimitError,
+    LLMResponseParseError,
+    LLMServerError,
+    LLMStreamError,
+    LLMTimeoutError,
+)
+from .openai_client import CircuitBreaker
+logger = logging.getLogger(__name__)
+# Model mappings for convenience
+ANTHROPIC_MODELS = {
+    "claude-3-opus": "claude-3-opus-20240229",
+    "claude-3-sonnet": "claude-3-sonnet-20240229",
+    "claude-3-haiku": "claude-3-haiku-20240307",
+    "claude-3.5-sonnet": "claude-3-5-sonnet-20240620",
+    "claude-3.5-sonnet-v2": "claude-3-5-sonnet-20241022",
+    "claude-sonnet-4": "claude-sonnet-4-20250514",
+    # Add latest models
+    "opus": "claude-3-opus-20240229",
+    "sonnet": "claude-3-5-sonnet-20241022",
+    "haiku": "claude-3-haiku-20240307",
+}
+class AnthropicClient(BaseLLMClient):
+    """
+    Anthropic Claude API client.
+    Features:
+    - Messages API support (not legacy completion API)
+    - Content block handling (text, tool_use)
+    - Streaming with proper SSE parsing
+    - Model alias mapping
+    - System prompt support
+    - Tool/function calling (beta)
+    """
+    PROVIDER_NAME = "anthropic"
+    DEFAULT_BASE_URL = "https://api.anthropic.com"
+    DEFAULT_MODEL = "claude-3-5-sonnet-20241022"
+    API_VERSION = "2023-06-01"
+    def __init__(
+        self,
+        api_key: str | None = None,
+        model: str | None = None,
+        base_url: str | None = None,
+        timeout: float = 120.0,  # Claude can be slower
+        max_retries: int = 3,
+        # Circuit breaker settings
+        circuit_breaker_threshold: int = 5,
+        circuit_breaker_reset: float = 60.0,
+        # Rate limiting
+        rate_limit_per_minute: int | None = None,
+    ):
+        """
+        Initialize Anthropic client.
+        Args:
+            api_key: Anthropic API key (or set ANTHROPIC_API_KEY env var)
+            model: Model to use (supports aliases like 'sonnet', 'opus')
+            base_url: API base URL
+            timeout: Request timeout in seconds (default longer for Claude)
+            max_retries: Max retry attempts
+            circuit_breaker_threshold: Failures before circuit opens
+            circuit_breaker_reset: Seconds before circuit resets
+            rate_limit_per_minute: Rate limit for requests per minute (None to disable)
+        """
+        import os
+        api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
+        if not api_key:
+            raise LLMAuthenticationError(self.PROVIDER_NAME, "API key not provided and ANTHROPIC_API_KEY not set")
+        # Resolve model alias
+        model_name = model or self.DEFAULT_MODEL
+        resolved_model = ANTHROPIC_MODELS.get(model_name, model_name)
+        super().__init__(
+            api_key=api_key,
+            model=resolved_model,
+            base_url=base_url or self.DEFAULT_BASE_URL,
+            timeout=timeout,
+            max_retries=max_retries,
+            rate_limit_per_minute=rate_limit_per_minute,
+        )
+        self.circuit_breaker = CircuitBreaker(
+            failure_threshold=circuit_breaker_threshold,
+            reset_timeout=circuit_breaker_reset,
+        )
+        self._client: httpx.AsyncClient | None = None
+    async def _get_client(self) -> httpx.AsyncClient:
+        """Get or create the HTTP client."""
+        if self._client is None or self._client.is_closed:
+            headers = {
+                "x-api-key": self.api_key,
+                "anthropic-version": self.API_VERSION,
+                "Content-Type": "application/json",
+            }
+            self._client = httpx.AsyncClient(
+                base_url=self.base_url,
+                headers=headers,
+                timeout=httpx.Timeout(self.timeout),
+            )
+        return self._client
+    def _convert_messages_to_anthropic(self, messages: list[dict]) -> tuple[str | None, list[dict]]:
+        """
+        Convert OpenAI-style messages to Anthropic format.
+        Returns:
+            Tuple of (system_prompt, messages)
+        """
+        system_prompt = None
+        anthropic_messages = []
+        for msg in messages:
+            role = msg.get("role", "user")
+            content = msg.get("content", "")
+            if role == "system":
+                # Anthropic uses separate system parameter
+                system_prompt = content
+            elif role == "assistant":
+                anthropic_messages.append({"role": "assistant", "content": content})
+            elif role == "user":
+                anthropic_messages.append({"role": "user", "content": content})
+            elif role == "tool":
+                # Tool result message
+                anthropic_messages.append(
+                    {
+                        "role": "user",
+                        "content": [
+                            {
+                                "type": "tool_result",
+                                "tool_use_id": msg.get("tool_call_id", ""),
+                                "content": content,
+                            }
+                        ],
+                    }
+                )
+        return system_prompt, anthropic_messages
+    def _convert_tools_to_anthropic(self, tools: list[dict]) -> list[dict]:
+        """Convert OpenAI-style tool definitions to Anthropic format."""
+        anthropic_tools = []
+        for tool in tools:
+            if tool.get("type") == "function":
+                func = tool["function"]
+                anthropic_tools.append(
+                    {
+                        "name": func["name"],
+                        "description": func.get("description", ""),
+                        "input_schema": func.get("parameters", {"type": "object"}),
+                    }
+                )
+            else:
+                # Already in Anthropic format
+                anthropic_tools.append(tool)
+        return anthropic_tools
+    def _handle_error_response(self, response: httpx.Response) -> None:
+        """Convert HTTP error responses to appropriate exceptions."""
+        status_code = response.status_code
+        try:
+            error_data = response.json()
+            error_type = error_data.get("error", {}).get("type", "")
+            error_message = error_data.get("error", {}).get("message", response.text)
+        except Exception:
+            error_type = ""
+            error_message = response.text
+        if status_code == 401:
+            raise LLMAuthenticationError(self.PROVIDER_NAME, error_message)
+        elif status_code == 429:
+            retry_after = response.headers.get("retry-after")
+            retry_after_float = float(retry_after) if retry_after else None
+            raise LLMRateLimitError(self.PROVIDER_NAME, retry_after=retry_after_float, message=error_message)
+        elif status_code == 402 or "billing" in error_type.lower():
+            raise LLMQuotaExceededError(self.PROVIDER_NAME, error_message)
+        elif status_code == 404 or error_type == "not_found_error":
+            raise LLMModelNotFoundError(self.PROVIDER_NAME, self.model)
+        elif status_code == 400:
+            if "context" in error_message.lower() or "token" in error_message.lower():
+                raise LLMContextLengthError(self.PROVIDER_NAME)
+            if "content_policy" in error_type or "safety" in error_message.lower():
+                raise LLMContentFilterError(self.PROVIDER_NAME, error_message)
+            raise LLMInvalidRequestError(self.PROVIDER_NAME, error_message)
+        elif status_code >= 500:
+            raise LLMServerError(self.PROVIDER_NAME, status_code, error_message)
+        else:
+            raise LLMClientError(error_message, self.PROVIDER_NAME, status_code=status_code)
+    def _make_retry_decorator(self):
+        """Create retry decorator with exponential backoff."""
+        return retry(
+            stop=stop_after_attempt(self.max_retries),
+            wait=wait_exponential(multiplier=1, min=2, max=120),
+            retry=retry_if_exception_type((LLMRateLimitError, LLMServerError, LLMConnectionError)),
+            before_sleep=before_sleep_log(logger, logging.WARNING),
+            reraise=True,
+        )
+    async def generate(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stream: bool = False,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse | AsyncIterator[str]:
+        """
+        Generate a response from Anthropic Claude.
+        Args:
+            messages: Chat messages (will be converted to Anthropic format)
+            prompt: Simple string prompt
+            temperature: Sampling temperature (0.0 to 1.0 for Claude)
+            max_tokens: Maximum tokens to generate (required for Anthropic)
+            tools: Tool definitions (will be converted to Anthropic format)
+            stream: If True, returns AsyncIterator
+            stop: Stop sequences
+            **kwargs: Additional parameters (top_p, top_k, etc.)
+        Returns:
+            LLMResponse or AsyncIterator[str] for streaming
+        """
+        # Apply rate limiting before proceeding
+        await self._apply_rate_limit()
+        # Check circuit breaker
+        if not self.circuit_breaker.can_execute():
+            raise CircuitBreakerOpenError(
+                self.PROVIDER_NAME,
+                self.circuit_breaker.failure_count,
+                self.circuit_breaker.get_reset_time(),
+            )
+        # Anthropic requires max_tokens
+        if max_tokens is None:
+            max_tokens = 4096  # Sensible default
+        if stream:
+            return self._generate_stream(
+                messages=messages,
+                prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                tools=tools,
+                stop=stop,
+                **kwargs,
+            )
+        else:
+            return await self._generate_non_stream(
+                messages=messages,
+                prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                tools=tools,
+                stop=stop,
+                **kwargs,
+            )
+    async def _generate_non_stream(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int = 4096,
+        tools: list[dict] | None = None,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse:
+        """Non-streaming generation with retry logic."""
+        @self._make_retry_decorator()
+        async def _request():
+            client = await self._get_client()
+            # Convert messages
+            built_messages = self._build_messages(messages, prompt)
+            system_prompt, anthropic_messages = self._convert_messages_to_anthropic(built_messages)
+            # Build request payload
+            payload = {
+                "model": self.model,
+                "messages": anthropic_messages,
+                "max_tokens": max_tokens,
+                "temperature": min(temperature, 1.0),  # Anthropic max is 1.0
+            }
+            if system_prompt:
+                payload["system"] = system_prompt
+            if stop:
+                payload["stop_sequences"] = stop
+            if tools:
+                payload["tools"] = self._convert_tools_to_anthropic(tools)
+            # Add any additional kwargs (top_p, top_k, etc.)
+            for key in ["top_p", "top_k", "metadata"]:
+                if key in kwargs:
+                    payload[key] = kwargs[key]
+            try:
+                response = await client.post("/v1/messages", json=payload)
+            except httpx.TimeoutException:
+                raise LLMTimeoutError(self.PROVIDER_NAME, self.timeout)
+            except httpx.ConnectError:
+                raise LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+            if response.status_code != 200:
+                self._handle_error_response(response)
+            return response
+        try:
+            response = await _request()
+            self.circuit_breaker.record_success()
+        except Exception:
+            self.circuit_breaker.record_failure()
+            raise
+        # Parse response
+        try:
+            data = response.json()
+            # Extract text from content blocks
+            text_parts = []
+            tool_calls = []
+            for block in data.get("content", []):
+                if block.get("type") == "text":
+                    text_parts.append(block.get("text", ""))
+                elif block.get("type") == "tool_use":
+                    tool_calls.append(
+                        ToolCall(
+                            id=block.get("id", ""),
+                            name=block.get("name", ""),
+                            arguments=block.get("input", {}),
+                            type="tool_use",
+                        )
+                    )
+            text = "\n".join(text_parts)
+            # Build usage dict
+            usage = {
+                "prompt_tokens": data.get("usage", {}).get("input_tokens", 0),
+                "completion_tokens": data.get("usage", {}).get("output_tokens", 0),
+            }
+            usage["total_tokens"] = usage["prompt_tokens"] + usage["completion_tokens"]
+            finish_reason = data.get("stop_reason", "stop")
+            if tool_calls:
+                llm_response = LLMToolResponse(
+                    text=text,
+                    usage=usage,
+                    model=data.get("model", self.model),
+                    raw_response=data,
+                    finish_reason=finish_reason,
+                    tool_calls=tool_calls,
+                )
+            else:
+                llm_response = LLMResponse(
+                    text=text,
+                    usage=usage,
+                    model=data.get("model", self.model),
+                    raw_response=data,
+                    finish_reason=finish_reason,
+                )
+            self._update_stats(llm_response)
+            return llm_response
+        except (KeyError, json.JSONDecodeError) as e:
+            raise LLMResponseParseError(self.PROVIDER_NAME, response.text) from e
+    async def _generate_stream(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int = 4096,
+        tools: list[dict] | None = None,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> AsyncIterator[str]:
+        """Streaming generation with Server-Sent Events."""
+        client = await self._get_client()
+        # Convert messages
+        built_messages = self._build_messages(messages, prompt)
+        system_prompt, anthropic_messages = self._convert_messages_to_anthropic(built_messages)
+        # Build request payload
+        payload = {
+            "model": self.model,
+            "messages": anthropic_messages,
+            "max_tokens": max_tokens,
+            "temperature": min(temperature, 1.0),
+            "stream": True,
+        }
+        if system_prompt:
+            payload["system"] = system_prompt
+        if stop:
+            payload["stop_sequences"] = stop
+        if tools:
+            payload["tools"] = self._convert_tools_to_anthropic(tools)
+        for key in ["top_p", "top_k"]:
+            if key in kwargs:
+                payload[key] = kwargs[key]
+        async def stream_generator():
+            try:
+                async with client.stream("POST", "/v1/messages", json=payload) as response:
+                    if response.status_code != 200:
+                        await response.aread()
+                        self._handle_error_response(response)
+                    async for line in response.aiter_lines():
+                        if not line.strip():
+                            continue
+                        if line.startswith("event:"):
+                            event_type = line[6:].strip()
+                            continue
+                        if line.startswith("data:"):
+                            data_str = line[5:].strip()
+                            if not data_str:
+                                continue
+                            try:
+                                data = json.loads(data_str)
+                                event_type = data.get("type", "")
+                                if event_type == "content_block_delta":
+                                    delta = data.get("delta", {})
+                                    if delta.get("type") == "text_delta":
+                                        text = delta.get("text", "")
+                                        if text:
+                                            yield text
+                                elif event_type == "message_stop":
+                                    break
+                            except json.JSONDecodeError:
+                                continue
+                self.circuit_breaker.record_success()
+            except httpx.TimeoutException:
+                self.circuit_breaker.record_failure()
+                raise LLMTimeoutError(self.PROVIDER_NAME, self.timeout)
+            except httpx.ConnectError:
+                self.circuit_breaker.record_failure()
+                raise LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+            except Exception as e:
+                self.circuit_breaker.record_failure()
+                if isinstance(e, LLMClientError):
+                    raise
+                raise LLMStreamError(self.PROVIDER_NAME, str(e)) from e
+        return stream_generator()
+    async def close(self) -> None:
+        """Close the HTTP client."""
+        if self._client and not self._client.is_closed:
+            await self._client.aclose()
+            self._client = None

src/adapters/llm/base.py ADDED Viewed

	@@ -0,0 +1,305 @@

+"""
+Base LLM client interface for provider-agnostic model access.
+This module defines the protocol and data structures for LLM clients,
+enabling seamless switching between providers (OpenAI, Anthropic, LM Studio, etc.)
+"""
+import asyncio
+import time
+from abc import ABC, abstractmethod
+from collections.abc import AsyncIterator
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Any, Protocol, runtime_checkable
+@dataclass
+class LLMResponse:
+    """Standardized response from any LLM provider."""
+    text: str
+    usage: dict = field(default_factory=dict)
+    model: str = ""
+    raw_response: Any = None
+    finish_reason: str = "stop"
+    created_at: datetime = field(default_factory=datetime.utcnow)
+    @property
+    def total_tokens(self) -> int:
+        """Total tokens used in request/response."""
+        return self.usage.get("total_tokens", 0)
+    @property
+    def prompt_tokens(self) -> int:
+        """Tokens used in prompt."""
+        return self.usage.get("prompt_tokens", 0)
+    @property
+    def completion_tokens(self) -> int:
+        """Tokens used in completion."""
+        return self.usage.get("completion_tokens", 0)
+@dataclass
+class ToolCall:
+    """Represents a tool/function call from the LLM."""
+    id: str
+    name: str
+    arguments: dict
+    type: str = "function"
+@dataclass
+class LLMToolResponse(LLMResponse):
+    """Response containing tool calls."""
+    tool_calls: list[ToolCall] = field(default_factory=list)
+class TokenBucketRateLimiter:
+    """
+    Token bucket rate limiter for controlling request rates.
+    This implementation uses a token bucket algorithm where:
+    - Tokens are added at a fixed rate (rate_per_second)
+    - Each request consumes one token
+    - If no tokens available, caller waits until one becomes available
+    """
+    def __init__(self, rate_per_minute: int = 60):
+        """
+        Initialize the rate limiter.
+        Args:
+            rate_per_minute: Maximum requests allowed per minute
+        """
+        self.rate_per_second = rate_per_minute / 60.0
+        self.max_tokens = float(rate_per_minute)
+        self.tokens = self.max_tokens
+        self.last_refill = time.monotonic()
+        self._lock = asyncio.Lock()
+        self._wait_count = 0
+        self._total_wait_time = 0.0
+    async def acquire(self) -> float:
+        """
+        Acquire a token, waiting if necessary.
+        Returns:
+            Time spent waiting (0.0 if no wait was needed)
+        """
+        async with self._lock:
+            now = time.monotonic()
+            elapsed = now - self.last_refill
+            # Refill tokens based on elapsed time
+            self.tokens = min(self.max_tokens, self.tokens + elapsed * self.rate_per_second)
+            self.last_refill = now
+            wait_time = 0.0
+            if self.tokens < 1:
+                # Calculate how long to wait for one token
+                wait_time = (1 - self.tokens) / self.rate_per_second
+                self._wait_count += 1
+                self._total_wait_time += wait_time
+                # Release lock during sleep to allow other operations
+                self._lock.release()
+                try:
+                    await asyncio.sleep(wait_time)
+                finally:
+                    await self._lock.acquire()
+                # After sleeping, update time and set tokens to 0
+                self.last_refill = time.monotonic()
+                self.tokens = 0
+            else:
+                self.tokens -= 1
+            return wait_time
+    @property
+    def stats(self) -> dict:
+        """Get rate limiter statistics."""
+        return {
+            "rate_limit_waits": self._wait_count,
+            "total_rate_limit_wait_time": self._total_wait_time,
+            "current_tokens": self.tokens,
+        }
+@runtime_checkable
+class LLMClient(Protocol):
+    """
+    Protocol for LLM clients.
+    This protocol defines the interface that all LLM provider adapters must implement.
+    Using Protocol allows for structural subtyping (duck typing) while maintaining
+    type safety.
+    """
+    async def generate(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stream: bool = False,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse | AsyncIterator[str]:
+        """
+        Generate a response from the LLM.
+        Args:
+            messages: List of message dicts in OpenAI format [{"role": "...", "content": "..."}]
+            prompt: Simple string prompt (converted to single user message)
+            temperature: Sampling temperature (0.0 to 2.0)
+            max_tokens: Maximum tokens to generate
+            tools: List of tool definitions for function calling
+            stream: If True, returns AsyncIterator[str] for streaming
+            stop: Stop sequences
+            **kwargs: Provider-specific parameters
+        Returns:
+            LLMResponse if stream=False, AsyncIterator[str] if stream=True
+        Raises:
+            LLMClientError: Base exception for all client errors
+        """
+        ...
+class BaseLLMClient(ABC):
+    """
+    Abstract base class for LLM clients.
+    Provides common functionality and enforces the interface contract.
+    All concrete implementations should inherit from this class.
+    """
+    def __init__(
+        self,
+        api_key: str | None = None,
+        model: str = "default",
+        base_url: str | None = None,
+        timeout: float = 60.0,
+        max_retries: int = 3,
+        rate_limit_per_minute: int | None = None,
+    ):
+        """
+        Initialize the LLM client.
+        Args:
+            api_key: API key for authentication
+            model: Model identifier
+            base_url: Base URL for API requests
+            timeout: Request timeout in seconds
+            max_retries: Maximum number of retry attempts
+            rate_limit_per_minute: Rate limit (requests per minute), None to disable
+        """
+        self.api_key = api_key
+        self.model = model
+        self.base_url = base_url
+        self.timeout = timeout
+        self.max_retries = max_retries
+        self._request_count = 0
+        self._total_tokens_used = 0
+        self._rate_limited_requests = 0
+        # Initialize rate limiter if configured
+        if rate_limit_per_minute is not None and rate_limit_per_minute > 0:
+            self._rate_limiter: TokenBucketRateLimiter | None = TokenBucketRateLimiter(
+                rate_per_minute=rate_limit_per_minute
+            )
+        else:
+            self._rate_limiter = None
+    @abstractmethod
+    async def generate(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stream: bool = False,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse | AsyncIterator[str]:
+        """Generate a response from the LLM."""
+        pass
+    def _build_messages(
+        self,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+    ) -> list[dict]:
+        """
+        Build message list from either messages or prompt.
+        Args:
+            messages: Pre-formatted message list
+            prompt: Simple string prompt
+        Returns:
+            List of message dicts
+        Raises:
+            ValueError: If neither messages nor prompt provided
+        """
+        if messages is not None:
+            return messages
+        elif prompt is not None:
+            return [{"role": "user", "content": prompt}]
+        else:
+            raise ValueError("Either 'messages' or 'prompt' must be provided")
+    def _update_stats(self, response: LLMResponse) -> None:
+        """Update internal statistics."""
+        self._request_count += 1
+        self._total_tokens_used += response.total_tokens
+    async def _apply_rate_limit(self) -> None:
+        """
+        Apply rate limiting if configured.
+        Waits if necessary to comply with rate limits.
+        Tracks rate-limited requests in metrics.
+        """
+        if self._rate_limiter is not None:
+            wait_time = await self._rate_limiter.acquire()
+            if wait_time > 0:
+                self._rate_limited_requests += 1
+    @property
+    def stats(self) -> dict:
+        """Get client statistics."""
+        base_stats = {
+            "request_count": self._request_count,
+            "total_tokens_used": self._total_tokens_used,
+            "rate_limited_requests": self._rate_limited_requests,
+        }
+        # Include rate limiter stats if available
+        if self._rate_limiter is not None:
+            base_stats.update(self._rate_limiter.stats)
+        return base_stats
+    async def close(self) -> None:  # noqa: B027
+        """Clean up resources. Override in subclasses if needed."""
+        pass
+    async def __aenter__(self):
+        """Async context manager entry."""
+        return self
+    async def __aexit__(self, exc_type, exc_val, exc_tb):
+        """Async context manager exit."""
+        await self.close()

src/adapters/llm/exceptions.py ADDED Viewed

	@@ -0,0 +1,204 @@

+"""
+Custom exceptions for LLM client operations.
+Provides a hierarchy of structured exceptions for better error handling
+and debugging across different LLM providers.
+"""
+class LLMClientError(Exception):
+    """Base exception for all LLM client errors."""
+    def __init__(
+        self,
+        message: str,
+        provider: str = "unknown",
+        status_code: int | None = None,
+        retry_after: float | None = None,
+    ):
+        self.message = message
+        self.provider = provider
+        self.status_code = status_code
+        self.retry_after = retry_after
+        super().__init__(self.message)
+    def __str__(self) -> str:
+        parts = [f"[{self.provider}] {self.message}"]
+        if self.status_code:
+            parts.append(f"(status: {self.status_code})")
+        return " ".join(parts)
+class LLMAuthenticationError(LLMClientError):
+    """Authentication failed - invalid or missing API key."""
+    def __init__(self, provider: str, message: str = "Authentication failed"):
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=401,
+        )
+class LLMRateLimitError(LLMClientError):
+    """Rate limit exceeded - too many requests."""
+    def __init__(
+        self,
+        provider: str,
+        retry_after: float | None = None,
+        message: str = "Rate limit exceeded",
+    ):
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=429,
+            retry_after=retry_after,
+        )
+class LLMQuotaExceededError(LLMClientError):
+    """Quota or credits exhausted."""
+    def __init__(self, provider: str, message: str = "Quota exceeded"):
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=402,
+        )
+class LLMModelNotFoundError(LLMClientError):
+    """Requested model not available."""
+    def __init__(self, provider: str, model: str):
+        super().__init__(
+            message=f"Model '{model}' not found or not available",
+            provider=provider,
+            status_code=404,
+        )
+class LLMContextLengthError(LLMClientError):
+    """Input exceeds model's context window."""
+    def __init__(
+        self,
+        provider: str,
+        token_count: int | None = None,
+        max_tokens: int | None = None,
+    ):
+        message = "Context length exceeded"
+        if token_count and max_tokens:
+            message = f"Context length exceeded: {token_count} tokens provided, max is {max_tokens}"
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=400,
+        )
+class LLMInvalidRequestError(LLMClientError):
+    """Invalid request parameters."""
+    def __init__(self, provider: str, message: str = "Invalid request parameters"):
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=400,
+        )
+class LLMTimeoutError(LLMClientError):
+    """Request timed out."""
+    def __init__(self, provider: str, timeout: float):
+        super().__init__(
+            message=f"Request timed out after {timeout}s",
+            provider=provider,
+            status_code=408,
+        )
+class LLMConnectionError(LLMClientError):
+    """Failed to connect to the API endpoint."""
+    def __init__(self, provider: str, url: str | None = None):
+        message = "Failed to connect to API"
+        if url:
+            message = f"Failed to connect to {url}"
+        super().__init__(
+            message=message,
+            provider=provider,
+        )
+class LLMServerError(LLMClientError):
+    """Server-side error from the LLM provider."""
+    def __init__(
+        self,
+        provider: str,
+        status_code: int = 500,
+        message: str = "Server error",
+    ):
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=status_code,
+        )
+class LLMResponseParseError(LLMClientError):
+    """Failed to parse response from LLM provider."""
+    def __init__(self, provider: str, raw_response: str | None = None):
+        message = "Failed to parse response"
+        if raw_response:
+            preview = raw_response[:200] + "..." if len(raw_response) > 200 else raw_response
+            message = f"Failed to parse response: {preview}"
+        super().__init__(
+            message=message,
+            provider=provider,
+        )
+class LLMStreamError(LLMClientError):
+    """Error during streaming response."""
+    def __init__(self, provider: str, message: str = "Stream interrupted"):
+        super().__init__(
+            message=message,
+            provider=provider,
+        )
+class LLMContentFilterError(LLMClientError):
+    """Content blocked by safety filters."""
+    def __init__(self, provider: str, reason: str | None = None):
+        message = "Content blocked by safety filters"
+        if reason:
+            message = f"Content blocked: {reason}"
+        super().__init__(
+            message=message,
+            provider=provider,
+            status_code=400,
+        )
+class CircuitBreakerOpenError(LLMClientError):
+    """Circuit breaker is open, requests are being blocked."""
+    def __init__(
+        self,
+        provider: str,
+        failure_count: int,
+        reset_time: float,
+    ):
+        super().__init__(
+            message=f"Circuit breaker open after {failure_count} failures. Resets in {reset_time:.1f}s",
+            provider=provider,
+        )
+        self.failure_count = failure_count
+        self.reset_time = reset_time

src/adapters/llm/lmstudio_client.py ADDED Viewed

	@@ -0,0 +1,346 @@

+"""
+LM Studio local LLM client adapter.
+Implements the LLMClient protocol for LM Studio's OpenAI-compatible API.
+Designed for running local models with configurable endpoint.
+"""
+import json
+import logging
+from collections.abc import AsyncIterator
+from typing import Any
+import httpx
+from .base import BaseLLMClient, LLMResponse
+from .exceptions import (
+    LLMClientError,
+    LLMConnectionError,
+    LLMResponseParseError,
+    LLMServerError,
+    LLMStreamError,
+    LLMTimeoutError,
+)
+logger = logging.getLogger(__name__)
+class LMStudioClient(BaseLLMClient):
+    """
+    LM Studio local server client.
+    LM Studio provides an OpenAI-compatible API for running local models.
+    This client is optimized for local deployment with:
+    - No authentication required (local)
+    - Configurable base URL
+    - No circuit breaker (local server expected to be stable)
+    - Longer timeouts for large models
+    """
+    PROVIDER_NAME = "lmstudio"
+    DEFAULT_BASE_URL = "http://localhost:1234/v1"
+    DEFAULT_MODEL = "local-model"  # LM Studio uses the loaded model
+    def __init__(
+        self,
+        api_key: str | None = None,  # Not required for local
+        model: str | None = None,
+        base_url: str | None = None,
+        timeout: float = 300.0,  # Long timeout for local inference
+        max_retries: int = 2,  # Fewer retries for local
+        # Rate limiting
+        rate_limit_per_minute: int | None = None,
+    ):
+        """
+        Initialize LM Studio client.
+        Args:
+            api_key: Not required for local server (ignored)
+            model: Model identifier (often ignored by LM Studio, uses loaded model)
+            base_url: Local server URL (default: http://localhost:1234/v1)
+            timeout: Request timeout in seconds (default longer for local models)
+            max_retries: Max retry attempts (fewer for local)
+            rate_limit_per_minute: Rate limit for requests per minute (None to disable)
+        """
+        import os
+        # Allow overriding via environment variable
+        base_url = base_url or os.environ.get("LMSTUDIO_BASE_URL", self.DEFAULT_BASE_URL)
+        super().__init__(
+            api_key=api_key or "not-required",  # Placeholder
+            model=model or self.DEFAULT_MODEL,
+            base_url=base_url,
+            timeout=timeout,
+            max_retries=max_retries,
+            rate_limit_per_minute=rate_limit_per_minute,
+        )
+        self._client: httpx.AsyncClient | None = None
+    async def _get_client(self) -> httpx.AsyncClient:
+        """Get or create the HTTP client."""
+        if self._client is None or self._client.is_closed:
+            headers = {"Content-Type": "application/json"}
+            # Add auth header if provided (some local servers may require it)
+            if self.api_key and self.api_key != "not-required":
+                headers["Authorization"] = f"Bearer {self.api_key}"
+            self._client = httpx.AsyncClient(
+                base_url=self.base_url,
+                headers=headers,
+                timeout=httpx.Timeout(self.timeout),
+            )
+        return self._client
+    async def check_health(self) -> bool:
+        """
+        Check if LM Studio server is running.
+        Returns:
+            True if server is accessible, False otherwise
+        """
+        try:
+            client = await self._get_client()
+            response = await client.get("/models")
+            return response.status_code == 200
+        except Exception:
+            return False
+    async def list_models(self) -> list[dict]:
+        """
+        List available models on the LM Studio server.
+        Returns:
+            List of model information dicts
+        """
+        try:
+            client = await self._get_client()
+            response = await client.get("/models")
+            if response.status_code == 200:
+                data = response.json()
+                return data.get("data", [])
+            return []
+        except Exception as e:
+            logger.warning(f"Failed to list models: {e}")
+            return []
+    def _handle_error_response(self, response: httpx.Response) -> None:
+        """Handle error responses from LM Studio server."""
+        status_code = response.status_code
+        try:
+            error_data = response.json()
+            error_message = error_data.get("error", {}).get("message", response.text)
+        except Exception:
+            error_message = response.text
+        if status_code >= 500:
+            raise LLMServerError(self.PROVIDER_NAME, status_code, error_message)
+        else:
+            raise LLMClientError(error_message, self.PROVIDER_NAME, status_code=status_code)
+    async def generate(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stream: bool = False,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse | AsyncIterator[str]:
+        """
+        Generate a response from LM Studio local model.
+        Args:
+            messages: Chat messages in OpenAI format
+            prompt: Simple string prompt
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            tools: Tool definitions (limited support in local models)
+            stream: If True, returns AsyncIterator
+            stop: Stop sequences
+            **kwargs: Additional parameters
+        Returns:
+            LLMResponse or AsyncIterator[str] for streaming
+        """
+        # Apply rate limiting before proceeding
+        await self._apply_rate_limit()
+        if stream:
+            return self._generate_stream(
+                messages=messages,
+                prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                tools=tools,
+                stop=stop,
+                **kwargs,
+            )
+        else:
+            return await self._generate_non_stream(
+                messages=messages,
+                prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                tools=tools,
+                stop=stop,
+                **kwargs,
+            )
+    async def _generate_non_stream(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse:
+        """Non-streaming generation."""
+        client = await self._get_client()
+        # Build request payload (OpenAI-compatible)
+        payload = {
+            "model": self.model,
+            "messages": self._build_messages(messages, prompt),
+            "temperature": temperature,
+        }
+        if max_tokens is not None:
+            payload["max_tokens"] = max_tokens
+        if stop:
+            payload["stop"] = stop
+        # Note: most local models don't support tools well
+        if tools:
+            logger.warning("Tool calling may not be fully supported by local models")
+            payload["tools"] = tools
+        # Add additional kwargs (e.g., top_p, repeat_penalty)
+        for key in ["top_p", "top_k", "repeat_penalty", "presence_penalty", "frequency_penalty"]:
+            if key in kwargs:
+                payload[key] = kwargs[key]
+        # Retry logic for local server
+        last_error = None
+        for attempt in range(self.max_retries):
+            try:
+                response = await client.post("/chat/completions", json=payload)
+                if response.status_code != 200:
+                    self._handle_error_response(response)
+                # Parse response
+                try:
+                    data = response.json()
+                    choice = data["choices"][0]
+                    message = choice["message"]
+                    usage = data.get("usage", {})
+                    finish_reason = choice.get("finish_reason", "stop")
+                    llm_response = LLMResponse(
+                        text=message.get("content", ""),
+                        usage=usage,
+                        model=data.get("model", self.model),
+                        raw_response=data,
+                        finish_reason=finish_reason,
+                    )
+                    self._update_stats(llm_response)
+                    return llm_response
+                except (KeyError, json.JSONDecodeError) as e:
+                    raise LLMResponseParseError(self.PROVIDER_NAME, response.text) from e
+            except httpx.TimeoutException:
+                last_error = LLMTimeoutError(self.PROVIDER_NAME, self.timeout)
+                logger.warning(f"Attempt {attempt + 1} timed out, retrying...")
+            except httpx.ConnectError:
+                last_error = LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+                logger.warning(f"Attempt {attempt + 1} connection failed, retrying...")
+            except LLMClientError:
+                raise  # Don't retry client errors
+        # All retries exhausted
+        if last_error:
+            raise last_error
+        raise LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+    async def _generate_stream(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,  # noqa: ARG002
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> AsyncIterator[str]:
+        """Streaming generation."""
+        client = await self._get_client()
+        # Build request payload
+        payload = {
+            "model": self.model,
+            "messages": self._build_messages(messages, prompt),
+            "temperature": temperature,
+            "stream": True,
+        }
+        if max_tokens is not None:
+            payload["max_tokens"] = max_tokens
+        if stop:
+            payload["stop"] = stop
+        for key in ["top_p", "top_k", "repeat_penalty"]:
+            if key in kwargs:
+                payload[key] = kwargs[key]
+        async def stream_generator():
+            try:
+                async with client.stream("POST", "/chat/completions", json=payload) as response:
+                    if response.status_code != 200:
+                        await response.aread()
+                        self._handle_error_response(response)
+                    async for line in response.aiter_lines():
+                        if line.startswith("data: "):
+                            data_str = line[6:]
+                            if data_str.strip() == "[DONE]":
+                                break
+                            try:
+                                data = json.loads(data_str)
+                                delta = data["choices"][0].get("delta", {})
+                                content = delta.get("content", "")
+                                if content:
+                                    yield content
+                            except (json.JSONDecodeError, KeyError):
+                                continue
+            except httpx.TimeoutException:
+                raise LLMTimeoutError(self.PROVIDER_NAME, self.timeout)
+            except httpx.ConnectError:
+                raise LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+            except Exception as e:
+                if isinstance(e, LLMClientError):
+                    raise
+                raise LLMStreamError(self.PROVIDER_NAME, str(e)) from e
+        return stream_generator()
+    async def close(self) -> None:
+        """Close the HTTP client."""
+        if self._client and not self._client.is_closed:
+            await self._client.aclose()
+            self._client = None

src/adapters/llm/openai_client.py ADDED Viewed

	@@ -0,0 +1,458 @@

+"""
+OpenAI-compatible LLM client adapter.
+Implements the LLMClient protocol for OpenAI API (and compatible APIs).
+Includes retry logic, circuit breaker pattern, and streaming support.
+"""
+import json
+import logging
+import time
+from collections.abc import AsyncIterator
+from typing import Any
+import httpx
+from tenacity import (
+    before_sleep_log,
+    retry,
+    retry_if_exception_type,
+    stop_after_attempt,
+    wait_exponential,
+)
+from .base import BaseLLMClient, LLMResponse, LLMToolResponse, ToolCall
+from .exceptions import (
+    CircuitBreakerOpenError,
+    LLMAuthenticationError,
+    LLMClientError,
+    LLMConnectionError,
+    LLMContextLengthError,
+    LLMInvalidRequestError,
+    LLMModelNotFoundError,
+    LLMQuotaExceededError,
+    LLMRateLimitError,
+    LLMResponseParseError,
+    LLMServerError,
+    LLMStreamError,
+    LLMTimeoutError,
+)
+logger = logging.getLogger(__name__)
+class CircuitBreaker:
+    """Simple circuit breaker implementation for resilience."""
+    def __init__(
+        self,
+        failure_threshold: int = 5,
+        reset_timeout: float = 60.0,
+        half_open_max_calls: int = 1,
+    ):
+        self.failure_threshold = failure_threshold
+        self.reset_timeout = reset_timeout
+        self.half_open_max_calls = half_open_max_calls
+        self.failure_count = 0
+        self.last_failure_time = 0.0
+        self.state = "closed"  # closed, open, half-open
+        self.half_open_calls = 0
+    def can_execute(self) -> bool:
+        """Check if request can be executed."""
+        if self.state == "closed":
+            return True
+        if self.state == "open":
+            # Check if reset timeout has passed
+            if time.time() - self.last_failure_time >= self.reset_timeout:
+                self.state = "half-open"
+                self.half_open_calls = 0
+                return True
+            return False
+        if self.state == "half-open":
+            return self.half_open_calls < self.half_open_max_calls
+        return False
+    def record_success(self) -> None:
+        """Record successful request."""
+        if self.state == "half-open":
+            self.state = "closed"
+            self.failure_count = 0
+        elif self.state == "closed":
+            self.failure_count = 0
+    def record_failure(self) -> None:
+        """Record failed request."""
+        self.failure_count += 1
+        self.last_failure_time = time.time()
+        if self.state == "half-open" or self.failure_count >= self.failure_threshold:
+            self.state = "open"
+    def get_reset_time(self) -> float:
+        """Get time until circuit resets."""
+        if self.state != "open":
+            return 0.0
+        elapsed = time.time() - self.last_failure_time
+        return max(0, self.reset_timeout - elapsed)
+class OpenAIClient(BaseLLMClient):
+    """
+    OpenAI API client with retry logic and circuit breaker.
+    Features:
+    - Exponential backoff retry for transient errors
+    - Circuit breaker to prevent cascading failures
+    - Streaming support
+    - Structured error handling
+    - Tool/function calling support
+    """
+    PROVIDER_NAME = "openai"
+    DEFAULT_BASE_URL = "https://api.openai.com/v1"
+    DEFAULT_MODEL = "gpt-4-turbo-preview"
+    def __init__(
+        self,
+        api_key: str | None = None,
+        model: str | None = None,
+        base_url: str | None = None,
+        timeout: float = 60.0,
+        max_retries: int = 3,
+        organization: str | None = None,
+        # Circuit breaker settings
+        circuit_breaker_threshold: int = 5,
+        circuit_breaker_reset: float = 60.0,
+        # Rate limiting
+        rate_limit_per_minute: int | None = None,
+    ):
+        """
+        Initialize OpenAI client.
+        Args:
+            api_key: OpenAI API key (or set OPENAI_API_KEY env var)
+            model: Model to use (default: gpt-4-turbo-preview)
+            base_url: API base URL (default: https://api.openai.com/v1)
+            timeout: Request timeout in seconds
+            max_retries: Max retry attempts for transient errors
+            organization: Optional organization ID
+            circuit_breaker_threshold: Failures before circuit opens
+            circuit_breaker_reset: Seconds before circuit resets
+            rate_limit_per_minute: Rate limit for requests per minute (None to disable)
+        """
+        import os
+        api_key = api_key or os.environ.get("OPENAI_API_KEY")
+        if not api_key:
+            raise LLMAuthenticationError(self.PROVIDER_NAME, "API key not provided and OPENAI_API_KEY not set")
+        super().__init__(
+            api_key=api_key,
+            model=model or self.DEFAULT_MODEL,
+            base_url=base_url or self.DEFAULT_BASE_URL,
+            timeout=timeout,
+            max_retries=max_retries,
+            rate_limit_per_minute=rate_limit_per_minute,
+        )
+        self.organization = organization
+        self.circuit_breaker = CircuitBreaker(
+            failure_threshold=circuit_breaker_threshold,
+            reset_timeout=circuit_breaker_reset,
+        )
+        # Initialize async HTTP client
+        self._client: httpx.AsyncClient | None = None
+    async def _get_client(self) -> httpx.AsyncClient:
+        """Get or create the HTTP client."""
+        if self._client is None or self._client.is_closed:
+            headers = {
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json",
+            }
+            if self.organization:
+                headers["OpenAI-Organization"] = self.organization
+            self._client = httpx.AsyncClient(
+                base_url=self.base_url,
+                headers=headers,
+                timeout=httpx.Timeout(self.timeout),
+            )
+        return self._client
+    def _handle_error_response(self, response: httpx.Response) -> None:
+        """Convert HTTP error responses to appropriate exceptions."""
+        status_code = response.status_code
+        try:
+            error_data = response.json()
+            error_message = error_data.get("error", {}).get("message", response.text)
+        except Exception:
+            error_message = response.text
+        if status_code == 401:
+            raise LLMAuthenticationError(self.PROVIDER_NAME, error_message)
+        elif status_code == 429:
+            retry_after = response.headers.get("Retry-After")
+            retry_after_float = float(retry_after) if retry_after else None
+            raise LLMRateLimitError(self.PROVIDER_NAME, retry_after=retry_after_float, message=error_message)
+        elif status_code == 402:
+            raise LLMQuotaExceededError(self.PROVIDER_NAME, error_message)
+        elif status_code == 404:
+            raise LLMModelNotFoundError(self.PROVIDER_NAME, self.model)
+        elif status_code == 400:
+            if "context_length" in error_message.lower():
+                raise LLMContextLengthError(self.PROVIDER_NAME)
+            raise LLMInvalidRequestError(self.PROVIDER_NAME, error_message)
+        elif status_code >= 500:
+            raise LLMServerError(self.PROVIDER_NAME, status_code, error_message)
+        else:
+            raise LLMClientError(error_message, self.PROVIDER_NAME, status_code=status_code)
+    def _make_retry_decorator(self):
+        """Create retry decorator with exponential backoff."""
+        return retry(
+            stop=stop_after_attempt(self.max_retries),
+            wait=wait_exponential(multiplier=1, min=1, max=60),
+            retry=retry_if_exception_type((LLMRateLimitError, LLMServerError, LLMConnectionError)),
+            before_sleep=before_sleep_log(logger, logging.WARNING),
+            reraise=True,
+        )
+    async def generate(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stream: bool = False,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse | AsyncIterator[str]:
+        """
+        Generate a response from OpenAI.
+        Args:
+            messages: Chat messages in OpenAI format
+            prompt: Simple string prompt
+            temperature: Sampling temperature (0.0 to 2.0)
+            max_tokens: Maximum tokens to generate
+            tools: Tool definitions for function calling
+            stream: If True, returns AsyncIterator
+            stop: Stop sequences
+            **kwargs: Additional OpenAI parameters (top_p, presence_penalty, etc.)
+        Returns:
+            LLMResponse or AsyncIterator[str] for streaming
+        """
+        # Apply rate limiting before proceeding
+        await self._apply_rate_limit()
+        # Check circuit breaker
+        if not self.circuit_breaker.can_execute():
+            raise CircuitBreakerOpenError(
+                self.PROVIDER_NAME,
+                self.circuit_breaker.failure_count,
+                self.circuit_breaker.get_reset_time(),
+            )
+        if stream:
+            return self._generate_stream(
+                messages=messages,
+                prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                tools=tools,
+                stop=stop,
+                **kwargs,
+            )
+        else:
+            return await self._generate_non_stream(
+                messages=messages,
+                prompt=prompt,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                tools=tools,
+                stop=stop,
+                **kwargs,
+            )
+    async def _generate_non_stream(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> LLMResponse:
+        """Non-streaming generation with retry logic."""
+        @self._make_retry_decorator()
+        async def _request():
+            client = await self._get_client()
+            # Build request payload
+            payload = {
+                "model": self.model,
+                "messages": self._build_messages(messages, prompt),
+                "temperature": temperature,
+            }
+            if max_tokens is not None:
+                payload["max_tokens"] = max_tokens
+            if stop:
+                payload["stop"] = stop
+            if tools:
+                payload["tools"] = tools
+                payload["tool_choice"] = kwargs.pop("tool_choice", "auto")
+            # Add any additional kwargs
+            payload.update(kwargs)
+            try:
+                response = await client.post("/chat/completions", json=payload)
+            except httpx.TimeoutException:
+                raise LLMTimeoutError(self.PROVIDER_NAME, self.timeout)
+            except httpx.ConnectError:
+                raise LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+            if response.status_code != 200:
+                self._handle_error_response(response)
+            return response
+        try:
+            response = await _request()
+            self.circuit_breaker.record_success()
+        except Exception:
+            self.circuit_breaker.record_failure()
+            raise
+        # Parse response
+        try:
+            data = response.json()
+            choice = data["choices"][0]
+            message = choice["message"]
+            usage = data.get("usage", {})
+            finish_reason = choice.get("finish_reason", "stop")
+            # Check for tool calls
+            if "tool_calls" in message:
+                tool_calls = [
+                    ToolCall(
+                        id=tc["id"],
+                        name=tc["function"]["name"],
+                        arguments=json.loads(tc["function"]["arguments"]),
+                    )
+                    for tc in message["tool_calls"]
+                ]
+                llm_response = LLMToolResponse(
+                    text=message.get("content", ""),
+                    usage=usage,
+                    model=data.get("model", self.model),
+                    raw_response=data,
+                    finish_reason=finish_reason,
+                    tool_calls=tool_calls,
+                )
+            else:
+                llm_response = LLMResponse(
+                    text=message.get("content", ""),
+                    usage=usage,
+                    model=data.get("model", self.model),
+                    raw_response=data,
+                    finish_reason=finish_reason,
+                )
+            self._update_stats(llm_response)
+            return llm_response
+        except (KeyError, json.JSONDecodeError) as e:
+            raise LLMResponseParseError(self.PROVIDER_NAME, response.text) from e
+    async def _generate_stream(
+        self,
+        *,
+        messages: list[dict] | None = None,
+        prompt: str | None = None,
+        temperature: float = 0.7,
+        max_tokens: int | None = None,
+        tools: list[dict] | None = None,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> AsyncIterator[str]:
+        """Streaming generation."""
+        client = await self._get_client()
+        # Build request payload
+        payload = {
+            "model": self.model,
+            "messages": self._build_messages(messages, prompt),
+            "temperature": temperature,
+            "stream": True,
+        }
+        if max_tokens is not None:
+            payload["max_tokens"] = max_tokens
+        if stop:
+            payload["stop"] = stop
+        # Note: tools with streaming have limited support
+        if tools:
+            payload["tools"] = tools
+        payload.update(kwargs)
+        async def stream_generator():
+            try:
+                async with client.stream("POST", "/chat/completions", json=payload) as response:
+                    if response.status_code != 200:
+                        # Read the full response for error handling
+                        await response.aread()
+                        self._handle_error_response(response)
+                    async for line in response.aiter_lines():
+                        if line.startswith("data: "):
+                            data_str = line[6:]
+                            if data_str.strip() == "[DONE]":
+                                break
+                            try:
+                                data = json.loads(data_str)
+                                delta = data["choices"][0].get("delta", {})
+                                content = delta.get("content", "")
+                                if content:
+                                    yield content
+                            except (json.JSONDecodeError, KeyError):
+                                continue
+                self.circuit_breaker.record_success()
+            except httpx.TimeoutException:
+                self.circuit_breaker.record_failure()
+                raise LLMTimeoutError(self.PROVIDER_NAME, self.timeout)
+            except httpx.ConnectError:
+                self.circuit_breaker.record_failure()
+                raise LLMConnectionError(self.PROVIDER_NAME, self.base_url)
+            except Exception as e:
+                self.circuit_breaker.record_failure()
+                if isinstance(e, LLMClientError):
+                    raise
+                raise LLMStreamError(self.PROVIDER_NAME, str(e)) from e
+        return stream_generator()
+    async def close(self) -> None:
+        """Close the HTTP client."""
+        if self._client and not self._client.is_closed:
+            await self._client.aclose()
+            self._client = None

src/agents/__init__.py ADDED Viewed

File without changes

src/agents/hrm_agent.py ADDED Viewed

	@@ -0,0 +1,454 @@

+"""
+Hierarchical Reasoning Model (HRM) Agent.
+Implements the HRM architecture with:
+- H-Module: High-level planning and decomposition
+- L-Module: Low-level execution and refinement
+- Adaptive Computation Time (ACT) for dynamic depth
+- Halting mechanism based on confidence thresholds
+Based on: "Hierarchical Reasoning for Compositional Generalization"
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from ..training.system_config import HRMConfig
+@dataclass
+class SubProblem:
+    """Represents a decomposed subproblem in the hierarchy."""
+    level: int  # Hierarchy level (0 = root, higher = more abstract)
+    description: str  # Natural language description
+    state: torch.Tensor  # Latent state representation
+    parent_id: int | None = None  # Parent subproblem ID
+    confidence: float = 0.0  # Confidence in this decomposition
+@dataclass
+class HRMOutput:
+    """Output from HRM processing."""
+    final_state: torch.Tensor  # Final processed state
+    subproblems: list[SubProblem]  # Hierarchical decomposition
+    halt_step: int  # Step at which halting occurred
+    total_ponder_cost: float  # Total computation cost (for training)
+    convergence_path: list[float]  # Confidence at each step
+class AdaptiveComputationTime(nn.Module):
+    """
+    Adaptive Computation Time (ACT) mechanism for dynamic depth.
+    Allows the model to "ponder" longer on difficult problems by
+    dynamically adjusting the number of processing steps.
+    """
+    def __init__(self, hidden_dim: int, epsilon: float = 0.01):
+        super().__init__()
+        self.epsilon = epsilon
+        # Halting unit: predicts probability of halting
+        self.halt_fc = nn.Sequential(
+            nn.Linear(hidden_dim, hidden_dim // 2),
+            nn.ReLU(),
+            nn.Linear(hidden_dim // 2, 1),
+            nn.Sigmoid(),
+        )
+    def forward(self, hidden_states: torch.Tensor) -> tuple[torch.Tensor, float]:
+        """
+        Compute halting probabilities.
+        Args:
+            hidden_states: [batch, seq, hidden_dim]
+        Returns:
+            halt_probs: [batch, seq] probability of halting
+            ponder_cost: Scalar cost for training
+        """
+        # Compute halting probabilities
+        halt_logits = self.halt_fc(hidden_states)  # [batch, seq, 1]
+        halt_probs = halt_logits.squeeze(-1)  # [batch, seq]
+        # Ponder cost is the expected number of steps
+        ponder_cost = halt_probs.sum(dim=-1).mean()
+        return halt_probs, ponder_cost
+class HModule(nn.Module):
+    """
+    H-Module: High-level planning and abstract reasoning.
+    Responsible for:
+    - Decomposing problems into subproblems
+    - Abstract planning and strategy
+    - Coordinating L-module executions
+    """
+    def __init__(self, config: HRMConfig):
+        super().__init__()
+        self.config = config
+        # Multi-head self-attention for relational reasoning
+        self.attention = nn.MultiheadAttention(
+            embed_dim=config.h_dim,
+            num_heads=8,
+            dropout=config.dropout,
+            batch_first=True,
+        )
+        # Feed-forward network
+        self.ffn = nn.Sequential(
+            nn.Linear(config.h_dim, config.h_dim * 4),
+            nn.GELU(),
+            nn.Dropout(config.dropout),
+            nn.Linear(config.h_dim * 4, config.h_dim),
+            nn.Dropout(config.dropout),
+        )
+        # Layer normalization
+        self.norm1 = nn.LayerNorm(config.h_dim)
+        self.norm2 = nn.LayerNorm(config.h_dim)
+        # Decomposition head: outputs subproblem structure
+        self.decompose_head = nn.Sequential(
+            nn.Linear(config.h_dim, config.h_dim),
+            nn.ReLU(),
+            nn.Linear(config.h_dim, config.h_dim),
+        )
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Process input through high-level reasoning.
+        Args:
+            x: [batch, seq, h_dim] input tensor
+        Returns:
+            [batch, seq, h_dim] processed tensor
+        """
+        # Self-attention for relational reasoning
+        attn_out, _ = self.attention(x, x, x)
+        x = self.norm1(x + attn_out)
+        # Feed-forward processing
+        ffn_out = self.ffn(x)
+        x = self.norm2(x + ffn_out)
+        return x
+    def decompose(self, x: torch.Tensor) -> torch.Tensor:
+        """Generate subproblem representations."""
+        return self.decompose_head(x)
+class LModule(nn.Module):
+    """
+    L-Module: Low-level execution and concrete operations.
+    Responsible for:
+    - Executing concrete operations
+    - Processing individual subproblems
+    - Generating intermediate results
+    """
+    def __init__(self, config: HRMConfig):
+        super().__init__()
+        self.config = config
+        # Projection from H-module to L-module dimension
+        self.h_to_l = nn.Linear(config.h_dim, config.l_dim)
+        # GRU for sequential processing
+        self.gru = nn.GRU(
+            input_size=config.l_dim,
+            hidden_size=config.l_dim,
+            num_layers=config.num_l_layers,
+            dropout=config.dropout if config.num_l_layers > 1 else 0,
+            batch_first=True,
+        )
+        # Output projection
+        self.output_proj = nn.Sequential(
+            nn.Linear(config.l_dim, config.l_dim * 2),
+            nn.ReLU(),
+            nn.Dropout(config.dropout),
+            nn.Linear(config.l_dim * 2, config.l_dim),
+        )
+        # Back-projection to H-module dimension
+        self.l_to_h = nn.Linear(config.l_dim, config.h_dim)
+    def forward(self, x: torch.Tensor, h_context: torch.Tensor | None = None) -> tuple[torch.Tensor, torch.Tensor]:
+        """
+        Execute low-level processing.
+        Args:
+            x: [batch, seq, h_dim] input from H-module
+            h_context: Optional hidden state
+        Returns:
+            output: [batch, seq, l_dim] processed output
+            l_to_h: [batch, seq, h_dim] back-projection to H-module
+        """
+        # Project to L-module dimension
+        x_l = self.h_to_l(x)
+        # Sequential processing
+        gru_out, _ = self.gru(x_l, h_context)
+        # Output processing
+        output = self.output_proj(gru_out)
+        # Back-project to H-module dimension for feedback
+        feedback = self.l_to_h(output)
+        return output, feedback
+class HRMAgent(nn.Module):
+    """
+    Complete Hierarchical Reasoning Model agent.
+    Combines H-module and L-module with ACT for adaptive computation.
+    """
+    def __init__(self, config: HRMConfig, device: str = "cpu"):
+        super().__init__()
+        self.config = config
+        self.device = device
+        # Input embedding
+        self.input_proj = nn.Linear(config.h_dim, config.h_dim)
+        # Core modules
+        self.h_module = nn.ModuleList([HModule(config) for _ in range(config.num_h_layers)])
+        self.l_module = LModule(config)
+        # Adaptive computation time
+        self.act = AdaptiveComputationTime(config.h_dim, config.ponder_epsilon)
+        # State integration
+        self.integrate = nn.Sequential(
+            nn.Linear(config.h_dim * 2, config.h_dim),
+            nn.LayerNorm(config.h_dim),
+            nn.GELU(),
+        )
+        self.to(device)
+    def forward(
+        self,
+        x: torch.Tensor,
+        max_steps: int | None = None,
+        return_decomposition: bool = False,
+    ) -> HRMOutput:
+        """
+        Process input through hierarchical reasoning.
+        Args:
+            x: [batch, seq, h_dim] input tensor
+            max_steps: Maximum outer loop steps (defaults to config)
+            return_decomposition: Whether to return subproblem decomposition
+        Returns:
+            HRMOutput containing final state and optional decomposition
+        """
+        batch_size, seq_len, _ = x.shape
+        max_steps = max_steps or self.config.max_outer_steps
+        # Initial projection
+        h_state = self.input_proj(x)
+        # Tracking
+        subproblems = []
+        convergence_path = []
+        total_ponder_cost = 0.0
+        # Outer loop: iterative refinement
+        for step in range(max_steps):
+            # H-module: high-level planning
+            for h_layer in self.h_module:
+                h_state = h_layer(h_state)
+            # Check halting condition
+            halt_probs, ponder_cost = self.act(h_state)
+            total_ponder_cost += ponder_cost
+            # Average halting probability across sequence
+            avg_halt_prob = halt_probs.mean().item()
+            convergence_path.append(avg_halt_prob)
+            # Generate subproblem decomposition if requested
+            if return_decomposition:
+                subproblem_repr = self.h_module[0].decompose(h_state)
+                # Create subproblem entries (simplified)
+                for i in range(min(3, seq_len)):  # Top 3 subproblems
+                    subproblems.append(
+                        SubProblem(
+                            level=step,
+                            description=f"Subproblem at step {step}, position {i}",
+                            state=subproblem_repr[:, i, :].detach(),
+                            confidence=halt_probs[:, i].mean().item(),
+                        )
+                    )
+            # Halt if confident enough
+            if avg_halt_prob >= self.config.halt_threshold:
+                break
+            # L-module: low-level execution
+            l_output, l_feedback = self.l_module(h_state)
+            # Integrate L-module feedback
+            h_state = self.integrate(torch.cat([h_state, l_feedback], dim=-1))
+        return HRMOutput(
+            final_state=h_state,
+            subproblems=subproblems,
+            halt_step=step + 1,
+            total_ponder_cost=total_ponder_cost,
+            convergence_path=convergence_path,
+        )
+    async def decompose_problem(self, query: str, state: torch.Tensor) -> list[SubProblem]:
+        """
+        Decompose a problem into hierarchical subproblems.
+        Args:
+            query: Natural language problem description
+            state: Initial state representation
+        Returns:
+            List of subproblems in hierarchical order
+        """
+        # Ensure state is batched
+        if state.dim() == 2:
+            state = state.unsqueeze(0)  # [1, seq, dim]
+        # Forward pass with decomposition
+        output = self.forward(state, return_decomposition=True)
+        # Add query context to subproblems
+        for i, sp in enumerate(output.subproblems):
+            sp.description = f"{query} -> Level {sp.level} Subproblem {i}"
+        return output.subproblems
+    def get_parameter_count(self) -> int:
+        """Return total number of trainable parameters."""
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+# Training utilities
+class HRMLoss(nn.Module):
+    """
+    Combined loss for HRM training.
+    Includes:
+    - Task loss (e.g., cross-entropy for classification)
+    - Ponder cost regularization (encourages efficiency)
+    - Consistency loss (encourages stable convergence)
+    """
+    def __init__(
+        self,
+        task_weight: float = 1.0,
+        ponder_weight: float = 0.01,
+        consistency_weight: float = 0.1,
+    ):
+        super().__init__()
+        self.task_weight = task_weight
+        self.ponder_weight = ponder_weight
+        self.consistency_weight = consistency_weight
+    def forward(
+        self,
+        hrm_output: HRMOutput,
+        predictions: torch.Tensor,
+        targets: torch.Tensor,
+        task_loss_fn: nn.Module,
+    ) -> tuple[torch.Tensor, dict]:
+        """
+        Compute combined loss.
+        Args:
+            hrm_output: Output from HRM forward pass
+            predictions: Model predictions
+            targets: Ground truth targets
+            task_loss_fn: Loss function for the task
+        Returns:
+            total_loss: Combined loss
+            loss_dict: Dictionary of individual loss components
+        """
+        # Task loss
+        task_loss = task_loss_fn(predictions, targets)
+        # Ponder cost (encourages efficiency)
+        ponder_loss = hrm_output.total_ponder_cost
+        # Consistency loss (encourages monotonic convergence)
+        if len(hrm_output.convergence_path) > 1:
+            conv_tensor = torch.tensor(hrm_output.convergence_path)
+            # Penalize non-monotonic increases
+            diffs = conv_tensor[1:] - conv_tensor[:-1]
+            consistency_loss = F.relu(-diffs).mean()  # Penalize decreases
+        else:
+            consistency_loss = torch.tensor(0.0)
+        # Combine losses
+        total_loss = (
+            self.task_weight * task_loss + self.ponder_weight * ponder_loss + self.consistency_weight * consistency_loss
+        )
+        loss_dict = {
+            "total": total_loss.item(),
+            "task": task_loss.item(),
+            "ponder": ponder_loss,
+            "consistency": consistency_loss.item(),
+            "halt_step": hrm_output.halt_step,
+        }
+        return total_loss, loss_dict
+def create_hrm_agent(config: HRMConfig, device: str = "cpu") -> HRMAgent:
+    """
+    Factory function to create and initialize HRM agent.
+    Args:
+        config: HRM configuration
+        device: Device to place model on
+    Returns:
+        Initialized HRMAgent
+    """
+    agent = HRMAgent(config, device)
+    # Initialize weights
+    def init_weights(m):
+        if isinstance(m, nn.Linear):
+            nn.init.xavier_uniform_(m.weight)
+            if m.bias is not None:
+                nn.init.zeros_(m.bias)
+        elif isinstance(m, nn.GRU):
+            for name, param in m.named_parameters():
+                if "weight" in name:
+                    nn.init.orthogonal_(param)
+                elif "bias" in name:
+                    nn.init.zeros_(param)
+    agent.apply(init_weights)
+    return agent

src/agents/meta_controller/__init__.py ADDED Viewed

	@@ -0,0 +1,45 @@

+"""
+Neural Meta-Controller package for Multi-Agent MCTS Framework.
+This package provides the base infrastructure for neural network-based
+meta-controllers that dynamically select which agent to route queries to.
+"""
+from src.agents.meta_controller.base import (
+    AbstractMetaController,
+    MetaControllerFeatures,
+    MetaControllerPrediction,
+)
+from src.agents.meta_controller.rnn_controller import (
+    RNNMetaController,
+    RNNMetaControllerModel,
+)
+from src.agents.meta_controller.utils import (
+    features_to_tensor,
+    features_to_text,
+    normalize_features,
+    one_hot_encode_agent,
+)
+# Import BERT controller (may not be available if transformers/peft not installed)
+try:
+    from src.agents.meta_controller.bert_controller import BERTMetaController  # noqa: F401
+    _bert_available = True
+except ImportError:
+    _bert_available = False
+__all__ = [
+    "AbstractMetaController",
+    "MetaControllerFeatures",
+    "MetaControllerPrediction",
+    "normalize_features",
+    "one_hot_encode_agent",
+    "features_to_tensor",
+    "features_to_text",
+    "RNNMetaController",
+    "RNNMetaControllerModel",
+]
+if _bert_available:
+    __all__.append("BERTMetaController")

src/agents/meta_controller/base.py ADDED Viewed

	@@ -0,0 +1,219 @@

+"""
+Abstract base class for Neural Meta-Controllers.
+Provides the foundation for neural network-based meta-controllers that
+dynamically select which agent (HRM, TRM, or MCTS) should handle a query.
+"""
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from typing import Any
+@dataclass
+class MetaControllerFeatures:
+    """
+    Features extracted from the current agent state for meta-controller prediction.
+    These features capture the current state of the multi-agent system,
+    including confidence scores from different agents and contextual information.
+    """
+    hrm_confidence: float
+    """Confidence score from the HRM (Human Response Model) agent."""
+    trm_confidence: float
+    """Confidence score from the TRM (Task Response Model) agent."""
+    mcts_value: float
+    """Value estimate from the MCTS (Monte Carlo Tree Search) process."""
+    consensus_score: float
+    """Agreement score between different agents."""
+    last_agent: str
+    """Name of the last agent used ('hrm', 'trm', 'mcts', or 'none')."""
+    iteration: int
+    """Current iteration number in the reasoning process."""
+    query_length: int
+    """Length of the input query in characters."""
+    has_rag_context: bool
+    """Whether RAG (Retrieval-Augmented Generation) context is available."""
+@dataclass
+class MetaControllerPrediction:
+    """
+    Prediction output from the meta-controller.
+    Contains the selected agent and associated confidence/probability information.
+    """
+    agent: str
+    """Name of the selected agent ('hrm', 'trm', or 'mcts')."""
+    confidence: float
+    """Confidence score for the prediction (0.0 to 1.0)."""
+    probabilities: dict[str, float] = field(default_factory=dict)
+    """Probability distribution over all possible agents."""
+class AbstractMetaController(ABC):
+    """
+    Abstract base class for neural meta-controllers.
+    This class defines the interface that all meta-controller implementations
+    must follow. Meta-controllers are responsible for deciding which agent
+    should handle a given query based on the current system state.
+    Attributes:
+        AGENT_NAMES: List of valid agent names that can be selected.
+        name: Name of this meta-controller instance.
+        seed: Random seed for reproducibility.
+    """
+    AGENT_NAMES = ["hrm", "trm", "mcts"]
+    def __init__(self, name: str, seed: int = 42) -> None:
+        """
+        Initialize the meta-controller.
+        Args:
+            name: Name identifier for this meta-controller instance.
+            seed: Random seed for reproducibility. Defaults to 42.
+        """
+        self.name = name
+        self.seed = seed
+    @abstractmethod
+    def predict(self, features: MetaControllerFeatures) -> MetaControllerPrediction:
+        """
+        Predict which agent should handle the current query.
+        Args:
+            features: Features extracted from the current agent state.
+        Returns:
+            Prediction containing the selected agent and confidence scores.
+        """
+        pass
+    @abstractmethod
+    def load_model(self, path: str) -> None:
+        """
+        Load a trained model from disk.
+        Args:
+            path: Path to the saved model file or directory.
+        """
+        pass
+    @abstractmethod
+    def save_model(self, path: str) -> None:
+        """
+        Save the current model to disk.
+        Args:
+            path: Path where the model should be saved.
+        """
+        pass
+    def extract_features(self, state: dict[str, Any]) -> MetaControllerFeatures:
+        """
+        Extract meta-controller features from an AgentState dictionary.
+        This method converts raw state information into the structured
+        MetaControllerFeatures format required for prediction.
+        Args:
+            state: Dictionary containing agent state information.
+                   Expected keys include:
+                   - 'hrm_confidence' or nested in 'agent_confidences'
+                   - 'trm_confidence' or nested in 'agent_confidences'
+                   - 'mcts_value' or nested in 'mcts_state'
+                   - 'consensus_score'
+                   - 'last_agent'
+                   - 'iteration'
+                   - 'query' or 'query_length'
+                   - 'rag_context' or 'has_rag_context'
+        Returns:
+            MetaControllerFeatures instance with extracted values.
+        Example:
+            >>> state = {
+            ...     'agent_confidences': {'hrm': 0.8, 'trm': 0.6},
+            ...     'mcts_state': {'value': 0.75},
+            ...     'consensus_score': 0.7,
+            ...     'last_agent': 'hrm',
+            ...     'iteration': 2,
+            ...     'query': 'What is machine learning?',
+            ...     'rag_context': 'ML is a subset of AI...'
+            ... }
+            >>> features = controller.extract_features(state)
+        """
+        # Extract HRM confidence
+        if "hrm_confidence" in state:
+            hrm_confidence = float(state["hrm_confidence"])
+        elif "agent_confidences" in state and isinstance(state["agent_confidences"], dict):
+            hrm_confidence = float(state["agent_confidences"].get("hrm", 0.0))
+        else:
+            hrm_confidence = 0.0
+        # Extract TRM confidence
+        if "trm_confidence" in state:
+            trm_confidence = float(state["trm_confidence"])
+        elif "agent_confidences" in state and isinstance(state["agent_confidences"], dict):
+            trm_confidence = float(state["agent_confidences"].get("trm", 0.0))
+        else:
+            trm_confidence = 0.0
+        # Extract MCTS value
+        if "mcts_value" in state:
+            mcts_value = float(state["mcts_value"])
+        elif "mcts_state" in state and isinstance(state["mcts_state"], dict):
+            mcts_value = float(state["mcts_state"].get("value", 0.0))
+        else:
+            mcts_value = 0.0
+        # Extract consensus score
+        consensus_score = float(state.get("consensus_score", 0.0))
+        # Extract last agent
+        last_agent = str(state.get("last_agent", "none"))
+        if last_agent not in self.AGENT_NAMES and last_agent != "none":
+            last_agent = "none"
+        # Extract iteration
+        iteration = int(state.get("iteration", 0))
+        # Extract query length
+        if "query_length" in state:
+            query_length = int(state["query_length"])
+        elif "query" in state and isinstance(state["query"], str):
+            query_length = len(state["query"])
+        else:
+            query_length = 0
+        # Extract has_rag_context
+        if "has_rag_context" in state:
+            has_rag_context = bool(state["has_rag_context"])
+        elif "rag_context" in state:
+            has_rag_context = state["rag_context"] is not None and len(str(state["rag_context"])) > 0
+        else:
+            has_rag_context = False
+        return MetaControllerFeatures(
+            hrm_confidence=hrm_confidence,
+            trm_confidence=trm_confidence,
+            mcts_value=mcts_value,
+            consensus_score=consensus_score,
+            last_agent=last_agent,
+            iteration=iteration,
+            query_length=query_length,
+            has_rag_context=has_rag_context,
+        )

src/agents/meta_controller/bert_controller.py ADDED Viewed

	@@ -0,0 +1,428 @@

+"""
+BERT-based Meta-Controller with LoRA adapters for efficient fine-tuning.
+This module provides a BERT-based meta-controller that uses Low-Rank Adaptation (LoRA)
+for parameter-efficient fine-tuning. The controller converts agent state features into
+text and uses a sequence classification model to predict the optimal agent.
+"""
+import warnings
+from typing import Any
+import torch
+from src.agents.meta_controller.base import (
+    AbstractMetaController,
+    MetaControllerFeatures,
+    MetaControllerPrediction,
+)
+from src.agents.meta_controller.utils import features_to_text
+# Handle optional transformers and peft imports gracefully
+_TRANSFORMERS_AVAILABLE = False
+_PEFT_AVAILABLE = False
+try:
+    from transformers import AutoModelForSequenceClassification, AutoTokenizer
+    _TRANSFORMERS_AVAILABLE = True
+except ImportError:
+    warnings.warn(
+        "transformers library not installed. Install it with: pip install transformers",
+        ImportWarning,
+        stacklevel=2,
+    )
+    AutoTokenizer = None  # type: ignore
+    AutoModelForSequenceClassification = None  # type: ignore
+try:
+    from peft import LoraConfig, TaskType, get_peft_model
+    _PEFT_AVAILABLE = True
+except ImportError:
+    warnings.warn(
+        "peft library not installed. Install it with: pip install peft",
+        ImportWarning,
+        stacklevel=2,
+    )
+    LoraConfig = None  # type: ignore
+    TaskType = None  # type: ignore
+    get_peft_model = None  # type: ignore
+class BERTMetaController(AbstractMetaController):
+    """
+    BERT-based meta-controller with optional LoRA adapters for efficient fine-tuning.
+    This controller converts agent state features into structured text and uses
+    a pre-trained BERT model (with optional LoRA adapters) to classify which
+    agent should handle the current query. LoRA enables parameter-efficient
+    fine-tuning by only training low-rank decomposition matrices.
+    Attributes:
+        DEFAULT_MODEL_NAME: Default BERT model to use.
+        NUM_LABELS: Number of output labels (agents to choose from).
+        device: PyTorch device for tensor operations.
+        model_name: Name of the pre-trained model.
+        lora_r: LoRA rank parameter.
+        lora_alpha: LoRA alpha scaling parameter.
+        lora_dropout: LoRA dropout rate.
+        use_lora: Whether to use LoRA adapters.
+        tokenizer: BERT tokenizer for text processing.
+        model: BERT sequence classification model (with or without LoRA).
+    Example:
+        >>> controller = BERTMetaController(name="BERTController", seed=42)
+        >>> features = MetaControllerFeatures(
+        ...     hrm_confidence=0.8,
+        ...     trm_confidence=0.6,
+        ...     mcts_value=0.75,
+        ...     consensus_score=0.7,
+        ...     last_agent='hrm',
+        ...     iteration=2,
+        ...     query_length=150,
+        ...     has_rag_context=True
+        ... )
+        >>> prediction = controller.predict(features)
+        >>> prediction.agent in ['hrm', 'trm', 'mcts']
+        True
+        >>> 0.0 <= prediction.confidence <= 1.0
+        True
+    """
+    DEFAULT_MODEL_NAME = "prajjwal1/bert-mini"
+    NUM_LABELS = 3
+    def __init__(
+        self,
+        name: str = "BERTMetaController",
+        seed: int = 42,
+        model_name: str | None = None,
+        lora_r: int = 4,
+        lora_alpha: int = 16,
+        lora_dropout: float = 0.1,
+        device: str | None = None,
+        use_lora: bool = True,
+    ) -> None:
+        """
+        Initialize the BERT meta-controller with optional LoRA adapters.
+        Args:
+            name: Name identifier for this controller. Defaults to "BERTMetaController".
+            seed: Random seed for reproducibility. Defaults to 42.
+            model_name: Pre-trained model name from HuggingFace. If None, uses DEFAULT_MODEL_NAME.
+            lora_r: LoRA rank parameter (lower = more compression). Defaults to 4.
+            lora_alpha: LoRA alpha scaling parameter. Defaults to 16.
+            lora_dropout: Dropout rate for LoRA layers. Defaults to 0.1.
+            device: Device to run model on ('cpu', 'cuda', 'mps', etc.).
+                   If None, auto-detects best available device.
+            use_lora: Whether to apply LoRA adapters to the model. Defaults to True.
+        Raises:
+            ImportError: If transformers library is not installed.
+            ImportError: If use_lora is True and peft library is not installed.
+        Example:
+            >>> controller = BERTMetaController(
+            ...     name="CustomBERT",
+            ...     seed=123,
+            ...     lora_r=8,
+            ...     lora_alpha=32,
+            ...     use_lora=True
+            ... )
+        """
+        super().__init__(name=name, seed=seed)
+        # Check for required dependencies
+        if not _TRANSFORMERS_AVAILABLE:
+            raise ImportError(
+                "transformers library is required for BERTMetaController. Install it with: pip install transformers"
+            )
+        if use_lora and not _PEFT_AVAILABLE:
+            raise ImportError("peft library is required for LoRA support. Install it with: pip install peft")
+        # Set random seed for reproducibility
+        torch.manual_seed(seed)
+        # Auto-detect device if not specified
+        if device is None:
+            if torch.cuda.is_available():
+                self.device = torch.device("cuda")
+            elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
+                self.device = torch.device("mps")
+            else:
+                self.device = torch.device("cpu")
+        else:
+            self.device = torch.device(device)
+        # Store configuration parameters
+        self.model_name = model_name if model_name is not None else self.DEFAULT_MODEL_NAME
+        self.lora_r = lora_r
+        self.lora_alpha = lora_alpha
+        self.lora_dropout = lora_dropout
+        self.use_lora = use_lora
+        # Initialize tokenizer
+        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
+        # Initialize base model for sequence classification
+        base_model = AutoModelForSequenceClassification.from_pretrained(self.model_name, num_labels=self.NUM_LABELS)
+        # Apply LoRA adapters if requested
+        if self.use_lora:
+            lora_config = LoraConfig(
+                task_type=TaskType.SEQ_CLS,
+                r=self.lora_r,
+                lora_alpha=self.lora_alpha,
+                lora_dropout=self.lora_dropout,
+                target_modules=["query", "value"],
+            )
+            self.model = get_peft_model(base_model, lora_config)
+        else:
+            self.model = base_model
+        # Move model to device
+        self.model = self.model.to(self.device)
+        # Set model to evaluation mode
+        self.model.eval()
+        # Initialize tokenization cache for performance optimization
+        self._tokenization_cache: dict[str, Any] = {}
+    def predict(self, features: MetaControllerFeatures) -> MetaControllerPrediction:
+        """
+        Predict which agent should handle the current query.
+        Converts features to structured text, tokenizes the text, runs through
+        the BERT model, and returns a prediction with confidence scores.
+        Args:
+            features: Features extracted from the current agent state.
+        Returns:
+            Prediction containing the selected agent, confidence score,
+            and probability distribution over all agents.
+        Example:
+            >>> controller = BERTMetaController()
+            >>> features = MetaControllerFeatures(
+            ...     hrm_confidence=0.9,
+            ...     trm_confidence=0.3,
+            ...     mcts_value=0.5,
+            ...     consensus_score=0.8,
+            ...     last_agent='none',
+            ...     iteration=0,
+            ...     query_length=100,
+            ...     has_rag_context=False
+            ... )
+            >>> pred = controller.predict(features)
+            >>> isinstance(pred.agent, str)
+            True
+            >>> isinstance(pred.confidence, float)
+            True
+            >>> len(pred.probabilities) == 3
+            True
+        """
+        # Convert features to structured text
+        text = features_to_text(features)
+        # Check cache for tokenized text
+        if text in self._tokenization_cache:
+            inputs = self._tokenization_cache[text]
+        else:
+            # Tokenize the text
+            inputs = self.tokenizer(
+                text,
+                return_tensors="pt",
+                padding=True,
+                truncation=True,
+                max_length=512,
+            )
+            # Cache the tokenized result
+            self._tokenization_cache[text] = inputs
+        # Move inputs to device
+        inputs = {key: value.to(self.device) for key, value in inputs.items()}
+        # Perform inference without gradient tracking
+        with torch.no_grad():
+            # Get logits from model
+            outputs = self.model(**inputs)
+            logits = outputs.logits
+            # Apply softmax to get probabilities
+            probabilities = torch.nn.functional.softmax(logits, dim=-1)
+            # Get predicted agent index (argmax)
+            predicted_idx = torch.argmax(probabilities, dim=-1).item()
+            # Extract confidence for selected agent
+            confidence = probabilities[0, predicted_idx].item()
+            # Create probability dictionary
+            prob_dict: dict[str, float] = {}
+            for i, agent_name in enumerate(self.AGENT_NAMES):
+                prob_dict[agent_name] = probabilities[0, i].item()
+        # Get agent name
+        selected_agent = self.AGENT_NAMES[predicted_idx]
+        return MetaControllerPrediction(
+            agent=selected_agent,
+            confidence=float(confidence),
+            probabilities=prob_dict,
+        )
+    def load_model(self, path: str) -> None:
+        """
+        Load a trained model from disk.
+        For LoRA models, loads the PEFT adapter weights. For base models,
+        loads the full state dictionary.
+        Args:
+            path: Path to the saved model file or directory.
+                 For LoRA models, this should be a directory containing
+                 adapter_config.json and adapter_model.bin.
+                 For base models, this should be a .pt or .pth file.
+        Raises:
+            FileNotFoundError: If the model file or directory does not exist.
+            RuntimeError: If the state dict is incompatible with the model.
+        Example:
+            >>> controller = BERTMetaController(use_lora=True)
+            >>> controller.load_model("/path/to/lora_adapter")
+            >>> controller = BERTMetaController(use_lora=False)
+            >>> controller.load_model("/path/to/model.pt")
+        """
+        if self.use_lora:
+            # Load PEFT adapter weights
+            # For PEFT models, the path should be a directory containing adapter files
+            from peft import PeftModel
+            # Get the base model from the PEFT wrapper
+            base_model = self.model.get_base_model()
+            # Load the PEFT model from the saved path
+            self.model = PeftModel.from_pretrained(base_model, path)
+            self.model = self.model.to(self.device)
+        else:
+            # Load base model state dict
+            state_dict = torch.load(path, map_location=self.device, weights_only=True)
+            self.model.load_state_dict(state_dict)
+        # Ensure model is in evaluation mode
+        self.model.eval()
+    def save_model(self, path: str) -> None:
+        """
+        Save the current model to disk.
+        For LoRA models, saves the PEFT adapter weights. For base models,
+        saves the full state dictionary.
+        Args:
+            path: Path where the model should be saved.
+                 For LoRA models, this should be a directory path where
+                 adapter_config.json and adapter_model.bin will be saved.
+                 For base models, this should be a .pt or .pth file path.
+        Example:
+            >>> controller = BERTMetaController(use_lora=True)
+            >>> controller.save_model("/path/to/lora_adapter")
+            >>> controller = BERTMetaController(use_lora=False)
+            >>> controller.save_model("/path/to/model.pt")
+        """
+        if self.use_lora:
+            # Save PEFT adapter weights
+            # This saves only the LoRA adapter weights, not the full model
+            self.model.save_pretrained(path)
+        else:
+            # Save base model state dict
+            torch.save(self.model.state_dict(), path)
+    def clear_cache(self) -> None:
+        """
+        Clear the tokenization cache.
+        This method removes all cached tokenized inputs, freeing memory.
+        Useful when processing many different feature combinations or
+        when memory usage is a concern.
+        Example:
+            >>> controller = BERTMetaController()
+            >>> # After many predictions...
+            >>> controller.clear_cache()
+            >>> info = controller.get_cache_info()
+            >>> info['cache_size'] == 0
+            True
+        """
+        self._tokenization_cache.clear()
+    def get_cache_info(self) -> dict[str, Any]:
+        """
+        Get information about the current tokenization cache.
+        Returns:
+            Dictionary containing cache statistics:
+            - cache_size: Number of cached tokenizations
+            - cache_keys: List of cached text inputs (truncated for display)
+        Example:
+            >>> controller = BERTMetaController()
+            >>> features = MetaControllerFeatures(
+            ...     hrm_confidence=0.8,
+            ...     trm_confidence=0.6,
+            ...     mcts_value=0.75,
+            ...     consensus_score=0.7,
+            ...     last_agent='hrm',
+            ...     iteration=2,
+            ...     query_length=150,
+            ...     has_rag_context=True
+            ... )
+            >>> _ = controller.predict(features)
+            >>> info = controller.get_cache_info()
+            >>> 'cache_size' in info
+            True
+            >>> info['cache_size'] >= 1
+            True
+        """
+        # Truncate keys for display (first 50 chars)
+        truncated_keys = [key[:50] + "..." if len(key) > 50 else key for key in self._tokenization_cache]
+        return {
+            "cache_size": len(self._tokenization_cache),
+            "cache_keys": truncated_keys,
+        }
+    def get_trainable_parameters(self) -> dict[str, int]:
+        """
+        Get the number of trainable and total parameters in the model.
+        This is particularly useful for LoRA models to see the efficiency
+        gains from using low-rank adaptation.
+        Returns:
+            Dictionary containing:
+            - total_params: Total number of parameters in the model
+            - trainable_params: Number of trainable parameters
+            - trainable_percentage: Percentage of parameters that are trainable
+        Example:
+            >>> controller = BERTMetaController(use_lora=True)
+            >>> params = controller.get_trainable_parameters()
+            >>> params['trainable_percentage'] < 10.0  # LoRA trains <10% of params
+            True
+        """
+        total_params = sum(p.numel() for p in self.model.parameters())
+        trainable_params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
+        trainable_percentage = (trainable_params / total_params) * 100 if total_params > 0 else 0.0
+        return {
+            "total_params": total_params,
+            "trainable_params": trainable_params,
+            "trainable_percentage": round(trainable_percentage, 2),
+        }

src/agents/meta_controller/config_loader.py ADDED Viewed

	@@ -0,0 +1,304 @@

+"""
+Configuration loader for the Neural Meta-Controller framework.
+This module provides dataclass-based configuration management for the Meta-Controller,
+supporting both RNN and BERT-based neural network controllers with comprehensive
+validation and serialization capabilities.
+"""
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Any
+import yaml
+@dataclass
+class RNNConfig:
+    """
+    Configuration for RNN-based Meta-Controller.
+    Attributes:
+        hidden_dim: Hidden dimension size for RNN layers. Default is 64.
+        num_layers: Number of RNN layers. Default is 1.
+        dropout: Dropout rate for regularization. Default is 0.1.
+        model_path: Optional path to a pre-trained model file. None for untrained model.
+    """
+    hidden_dim: int = 64
+    num_layers: int = 1
+    dropout: float = 0.1
+    model_path: str | None = None
+@dataclass
+class BERTConfig:
+    """
+    Configuration for BERT-based Meta-Controller with LoRA fine-tuning.
+    Attributes:
+        model_name: Name of the pre-trained BERT model from HuggingFace.
+                   Default is "prajjwal1/bert-mini" for lightweight deployment.
+        use_lora: Whether to use LoRA (Low-Rank Adaptation) for efficient fine-tuning.
+                  Default is True.
+        lora_r: LoRA rank parameter. Controls the rank of the low-rank matrices.
+                Default is 4.
+        lora_alpha: LoRA alpha parameter. Scaling factor for LoRA weights.
+                    Default is 16.
+        lora_dropout: Dropout rate for LoRA layers. Default is 0.1.
+        model_path: Optional path to a trained LoRA adapter. None for base model only.
+    """
+    model_name: str = "prajjwal1/bert-mini"
+    use_lora: bool = True
+    lora_r: int = 4
+    lora_alpha: int = 16
+    lora_dropout: float = 0.1
+    model_path: str | None = None
+@dataclass
+class InferenceConfig:
+    """
+    Configuration for inference settings.
+    Attributes:
+        device: Device to use for inference ("cpu", "cuda", "cuda:0", etc.).
+                None for auto-detection based on available hardware.
+        seed: Random seed for reproducibility. Default is 42.
+    """
+    device: str | None = None
+    seed: int = 42
+@dataclass
+class MetaControllerConfig:
+    """
+    Main configuration for the Neural Meta-Controller framework.
+    This configuration controls the behavior of the Meta-Controller, including
+    which type of neural network to use (RNN or BERT), fallback behavior,
+    and specific model parameters.
+    Attributes:
+        enabled: Whether the neural Meta-Controller is enabled. Default is False
+                 for backward compatibility with rule-based systems.
+        type: Type of neural network controller ("rnn" or "bert"). Default is "rnn".
+        fallback_to_rule_based: Whether to fall back to rule-based selection on errors.
+                                Default is True for robustness.
+        rnn: Configuration for RNN-based controller.
+        bert: Configuration for BERT-based controller.
+        inference: Configuration for inference settings.
+    """
+    enabled: bool = False
+    type: str = "rnn"  # "rnn" or "bert"
+    fallback_to_rule_based: bool = True
+    rnn: RNNConfig = field(default_factory=RNNConfig)
+    bert: BERTConfig = field(default_factory=BERTConfig)
+    inference: InferenceConfig = field(default_factory=InferenceConfig)
+class MetaControllerConfigLoader:
+    """
+    Loader class for Meta-Controller configuration.
+    Provides methods for loading configuration from YAML files or dictionaries,
+    converting configuration to dictionaries, and validating configuration values.
+    Example:
+        >>> loader = MetaControllerConfigLoader()
+        >>> config = loader.load_from_yaml("config/meta_controller.yaml")
+        >>> print(config.type)
+        'rnn'
+        >>> config.validate()
+    """
+    @staticmethod
+    def load_from_yaml(path: str) -> MetaControllerConfig:
+        """
+        Load Meta-Controller configuration from a YAML file.
+        Args:
+            path: Path to the YAML configuration file.
+        Returns:
+            MetaControllerConfig: Loaded and parsed configuration object.
+        Raises:
+            FileNotFoundError: If the specified file does not exist.
+            yaml.YAMLError: If the file contains invalid YAML.
+            KeyError: If the 'meta_controller' key is missing from the file.
+        Example:
+            >>> config = MetaControllerConfigLoader.load_from_yaml("config/meta_controller.yaml")
+            >>> print(config.enabled)
+            False
+        """
+        yaml_path = Path(path)
+        if not yaml_path.exists():
+            raise FileNotFoundError(f"Configuration file not found: {path}")
+        with open(yaml_path) as f:
+            raw_config = yaml.safe_load(f)
+        if "meta_controller" not in raw_config:
+            raise KeyError("Configuration file must contain 'meta_controller' key")
+        return MetaControllerConfigLoader.load_from_dict(raw_config["meta_controller"])
+    @staticmethod
+    def load_from_dict(config_dict: dict[str, Any]) -> MetaControllerConfig:
+        """
+        Load Meta-Controller configuration from a dictionary.
+        Args:
+            config_dict: Dictionary containing configuration values.
+        Returns:
+            MetaControllerConfig: Parsed configuration object with defaults
+                                  applied for missing values.
+        Example:
+            >>> config_dict = {
+            ...     'enabled': True,
+            ...     'type': 'bert',
+            ...     'bert': {'model_name': 'bert-base-uncased'}
+            ... }
+            >>> config = MetaControllerConfigLoader.load_from_dict(config_dict)
+            >>> print(config.type)
+            'bert'
+        """
+        # Parse nested configurations
+        rnn_config = RNNConfig(**config_dict.get("rnn", {}))
+        bert_config = BERTConfig(**config_dict.get("bert", {}))
+        inference_config = InferenceConfig(**config_dict.get("inference", {}))
+        # Create main config with nested configs
+        return MetaControllerConfig(
+            enabled=config_dict.get("enabled", False),
+            type=config_dict.get("type", "rnn"),
+            fallback_to_rule_based=config_dict.get("fallback_to_rule_based", True),
+            rnn=rnn_config,
+            bert=bert_config,
+            inference=inference_config,
+        )
+    @staticmethod
+    def to_dict(config: MetaControllerConfig) -> dict[str, Any]:
+        """
+        Convert a MetaControllerConfig object to a dictionary.
+        Args:
+            config: MetaControllerConfig object to convert.
+        Returns:
+            Dict[str, Any]: Dictionary representation of the configuration.
+        Example:
+            >>> config = MetaControllerConfig(enabled=True, type='bert')
+            >>> config_dict = MetaControllerConfigLoader.to_dict(config)
+            >>> print(config_dict['enabled'])
+            True
+        """
+        return asdict(config)
+    @staticmethod
+    def validate(config: MetaControllerConfig) -> None:
+        """
+        Validate the Meta-Controller configuration.
+        Checks that:
+        - The controller type is valid ("rnn" or "bert")
+        - Model paths exist if specified
+        - Numeric parameters are within valid ranges
+        Args:
+            config: MetaControllerConfig object to validate.
+        Raises:
+            ValueError: If the configuration contains invalid values.
+            FileNotFoundError: If specified model paths do not exist.
+        Example:
+            >>> config = MetaControllerConfig(type='invalid')
+            >>> MetaControllerConfigLoader.validate(config)
+            ValueError: Invalid controller type 'invalid'. Must be 'rnn' or 'bert'.
+        """
+        # Validate controller type
+        valid_types = ["rnn", "bert"]
+        if config.type not in valid_types:
+            raise ValueError(f"Invalid controller type '{config.type}'. Must be one of: {valid_types}")
+        # Validate RNN config
+        if config.rnn.hidden_dim <= 0:
+            raise ValueError(f"RNN hidden_dim must be positive, got {config.rnn.hidden_dim}")
+        if config.rnn.num_layers <= 0:
+            raise ValueError(f"RNN num_layers must be positive, got {config.rnn.num_layers}")
+        if not 0.0 <= config.rnn.dropout <= 1.0:
+            raise ValueError(f"RNN dropout must be between 0 and 1, got {config.rnn.dropout}")
+        if config.rnn.model_path is not None:
+            rnn_path = Path(config.rnn.model_path)
+            if not rnn_path.exists():
+                raise FileNotFoundError(f"RNN model path does not exist: {config.rnn.model_path}")
+        # Validate BERT config
+        if config.bert.lora_r <= 0:
+            raise ValueError(f"BERT lora_r must be positive, got {config.bert.lora_r}")
+        if config.bert.lora_alpha <= 0:
+            raise ValueError(f"BERT lora_alpha must be positive, got {config.bert.lora_alpha}")
+        if not 0.0 <= config.bert.lora_dropout <= 1.0:
+            raise ValueError(f"BERT lora_dropout must be between 0 and 1, got {config.bert.lora_dropout}")
+        if config.bert.model_path is not None:
+            bert_path = Path(config.bert.model_path)
+            if not bert_path.exists():
+                raise FileNotFoundError(f"BERT model path does not exist: {config.bert.model_path}")
+        # Validate inference config
+        if config.inference.device is not None:
+            valid_devices = ["cpu", "cuda", "mps"]
+            # Check if device starts with a valid prefix (e.g., "cuda:0", "cuda:1")
+            device_base = config.inference.device.split(":")[0]
+            if device_base not in valid_devices:
+                raise ValueError(f"Invalid device '{config.inference.device}'. Must start with one of: {valid_devices}")
+        if not isinstance(config.inference.seed, int) or config.inference.seed < 0:
+            raise ValueError(f"Inference seed must be a non-negative integer, got {config.inference.seed}")
+    @staticmethod
+    def save_to_yaml(config: MetaControllerConfig, path: str) -> None:
+        """
+        Save a MetaControllerConfig object to a YAML file.
+        Args:
+            config: MetaControllerConfig object to save.
+            path: Path where the YAML file will be saved.
+        Example:
+            >>> config = MetaControllerConfig(enabled=True)
+            >>> MetaControllerConfigLoader.save_to_yaml(config, "my_config.yaml")
+        """
+        yaml_path = Path(path)
+        yaml_path.parent.mkdir(parents=True, exist_ok=True)
+        config_dict = {"meta_controller": MetaControllerConfigLoader.to_dict(config)}
+        with open(yaml_path, "w") as f:
+            yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)
+    @staticmethod
+    def get_default_config() -> MetaControllerConfig:
+        """
+        Get a default MetaControllerConfig with all default values.
+        Returns:
+            MetaControllerConfig: Configuration object with default values.
+        Example:
+            >>> config = MetaControllerConfigLoader.get_default_config()
+            >>> print(config.enabled)
+            False
+        """
+        return MetaControllerConfig()

src/agents/meta_controller/rnn_controller.py ADDED Viewed

	@@ -0,0 +1,345 @@

+"""
+RNN-based Meta-Controller for dynamic agent selection.
+This module provides a GRU-based recurrent neural network meta-controller
+that learns to select the optimal agent (HRM, TRM, or MCTS) based on
+sequential patterns in the agent state features.
+"""
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from src.agents.meta_controller.base import (
+    AbstractMetaController,
+    MetaControllerFeatures,
+    MetaControllerPrediction,
+)
+from src.agents.meta_controller.utils import features_to_tensor
+class RNNMetaControllerModel(nn.Module):
+    """
+    GRU-based neural network model for meta-controller predictions.
+    This model uses a Gated Recurrent Unit (GRU) to capture sequential
+    patterns in agent state features and predict which agent should be
+    selected next.
+    Architecture:
+        - GRU layer for sequence processing
+        - Dropout for regularization
+        - Linear layer for classification
+    Attributes:
+        gru: GRU recurrent layer for processing sequences.
+        dropout: Dropout layer for regularization.
+        fc: Fully connected output layer.
+        hidden_dim: Dimension of the hidden state.
+        num_layers: Number of GRU layers.
+    """
+    def __init__(
+        self,
+        input_dim: int = 10,
+        hidden_dim: int = 64,
+        num_layers: int = 1,
+        num_agents: int = 3,
+        dropout: float = 0.1,
+    ) -> None:
+        """
+        Initialize the RNN meta-controller model.
+        Args:
+            input_dim: Dimension of input features. Defaults to 10.
+            hidden_dim: Dimension of GRU hidden state. Defaults to 64.
+            num_layers: Number of stacked GRU layers. Defaults to 1.
+            num_agents: Number of agents to choose from. Defaults to 3.
+            dropout: Dropout probability for regularization. Defaults to 0.1.
+        """
+        super().__init__()
+        self.hidden_dim = hidden_dim
+        self.num_layers = num_layers
+        # GRU layer for sequence processing
+        self.gru = nn.GRU(
+            input_size=input_dim,
+            hidden_size=hidden_dim,
+            num_layers=num_layers,
+            batch_first=True,
+            dropout=dropout if num_layers > 1 else 0.0,
+        )
+        # Dropout for regularization
+        self.dropout = nn.Dropout(p=dropout)
+        # Linear output layer for classification
+        self.fc = nn.Linear(hidden_dim, num_agents)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Forward pass through the model.
+        Processes input features through GRU and produces agent selection logits.
+        Args:
+            x: Input tensor of shape (batch_size, features) or
+               (batch_size, seq_len, features).
+        Returns:
+            Logits tensor of shape (batch_size, num_agents).
+            Note: Returns raw logits, NOT softmax probabilities.
+        Example:
+            >>> model = RNNMetaControllerModel()
+            >>> x = torch.randn(4, 10)  # batch of 4, 10 features
+            >>> logits = model(x)
+            >>> logits.shape
+            torch.Size([4, 3])
+        """
+        # Handle 2D input by adding sequence dimension
+        if x.dim() == 2:
+            # Shape: (batch_size, features) -> (batch_size, 1, features)
+            x = x.unsqueeze(1)
+        # Pass through GRU
+        # output shape: (batch_size, seq_len, hidden_dim)
+        # hidden shape: (num_layers, batch_size, hidden_dim)
+        output, hidden = self.gru(x)
+        # Take the final hidden state from the last layer
+        # Shape: (batch_size, hidden_dim)
+        final_hidden = hidden[-1] if self.num_layers > 1 else hidden.squeeze(0)
+        # Apply dropout
+        dropped = self.dropout(final_hidden)
+        # Apply linear layer to get logits
+        logits = self.fc(dropped)
+        return logits
+class RNNMetaController(AbstractMetaController):
+    """
+    RNN-based meta-controller using GRU for agent selection.
+    This controller uses a recurrent neural network to learn patterns in
+    agent state sequences and predict the optimal agent for the current
+    situation. It supports both CPU and GPU execution.
+    Attributes:
+        device: PyTorch device (CPU or CUDA) for tensor operations.
+        hidden_dim: Dimension of GRU hidden state.
+        num_layers: Number of GRU layers.
+        dropout: Dropout probability.
+        model: The underlying RNNMetaControllerModel.
+        hidden_state: Optional hidden state for sequence tracking.
+    Example:
+        >>> controller = RNNMetaController(name="RNNController", seed=42)
+        >>> features = MetaControllerFeatures(
+        ...     hrm_confidence=0.8,
+        ...     trm_confidence=0.6,
+        ...     mcts_value=0.75,
+        ...     consensus_score=0.7,
+        ...     last_agent='hrm',
+        ...     iteration=2,
+        ...     query_length=150,
+        ...     has_rag_context=True
+        ... )
+        >>> prediction = controller.predict(features)
+        >>> prediction.agent in ['hrm', 'trm', 'mcts']
+        True
+        >>> 0.0 <= prediction.confidence <= 1.0
+        True
+    """
+    def __init__(
+        self,
+        name: str = "RNNMetaController",
+        seed: int = 42,
+        hidden_dim: int = 64,
+        num_layers: int = 1,
+        dropout: float = 0.1,
+        device: str | None = None,
+    ) -> None:
+        """
+        Initialize the RNN meta-controller.
+        Args:
+            name: Name identifier for this controller. Defaults to "RNNMetaController".
+            seed: Random seed for reproducibility. Defaults to 42.
+            hidden_dim: Dimension of GRU hidden state. Defaults to 64.
+            num_layers: Number of GRU layers. Defaults to 1.
+            dropout: Dropout probability. Defaults to 0.1.
+            device: Device to run model on ('cpu', 'cuda', 'mps', etc.).
+                   If None, auto-detects best available device.
+        """
+        super().__init__(name=name, seed=seed)
+        # Set random seed for reproducibility
+        torch.manual_seed(seed)
+        # Auto-detect device if not specified
+        if device is None:
+            if torch.cuda.is_available():
+                self.device = torch.device("cuda")
+            elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
+                self.device = torch.device("mps")
+            else:
+                self.device = torch.device("cpu")
+        else:
+            self.device = torch.device(device)
+        # Store configuration
+        self.hidden_dim = hidden_dim
+        self.num_layers = num_layers
+        self.dropout = dropout
+        # Initialize model
+        self.model = RNNMetaControllerModel(
+            input_dim=10,  # Fixed based on features_to_tensor output
+            hidden_dim=hidden_dim,
+            num_layers=num_layers,
+            num_agents=len(self.AGENT_NAMES),
+            dropout=dropout,
+        )
+        # Move model to device
+        self.model = self.model.to(self.device)
+        # Set model to evaluation mode
+        self.model.eval()
+        # Initialize hidden state for sequence tracking
+        self.hidden_state: torch.Tensor | None = None
+    def predict(self, features: MetaControllerFeatures) -> MetaControllerPrediction:
+        """
+        Predict which agent should handle the current query.
+        Converts features to tensor format, runs through the GRU model,
+        and returns a prediction with confidence scores.
+        Args:
+            features: Features extracted from the current agent state.
+        Returns:
+            Prediction containing the selected agent, confidence score,
+            and probability distribution over all agents.
+        Example:
+            >>> controller = RNNMetaController()
+            >>> features = MetaControllerFeatures(
+            ...     hrm_confidence=0.9,
+            ...     trm_confidence=0.3,
+            ...     mcts_value=0.5,
+            ...     consensus_score=0.8,
+            ...     last_agent='none',
+            ...     iteration=0,
+            ...     query_length=100,
+            ...     has_rag_context=False
+            ... )
+            >>> pred = controller.predict(features)
+            >>> isinstance(pred.agent, str)
+            True
+            >>> isinstance(pred.confidence, float)
+            True
+            >>> len(pred.probabilities) == 3
+            True
+        """
+        # Convert features to tensor
+        feature_tensor = features_to_tensor(features)
+        # Add batch dimension: (10,) -> (1, 10)
+        feature_tensor = feature_tensor.unsqueeze(0)
+        # Move to device
+        feature_tensor = feature_tensor.to(self.device)
+        # Perform inference without gradient tracking
+        with torch.no_grad():
+            # Get logits from model
+            logits = self.model(feature_tensor)
+            # Apply softmax to get probabilities
+            probabilities = F.softmax(logits, dim=-1)
+            # Get predicted agent index (argmax)
+            predicted_idx = torch.argmax(probabilities, dim=-1).item()
+            # Extract confidence for selected agent
+            confidence = probabilities[0, predicted_idx].item()
+            # Create probability dictionary
+            prob_dict: dict[str, float] = {}
+            for i, agent_name in enumerate(self.AGENT_NAMES):
+                prob_dict[agent_name] = probabilities[0, i].item()
+        # Get agent name
+        selected_agent = self.AGENT_NAMES[predicted_idx]
+        return MetaControllerPrediction(
+            agent=selected_agent,
+            confidence=float(confidence),
+            probabilities=prob_dict,
+        )
+    def load_model(self, path: str) -> None:
+        """
+        Load a trained model from disk.
+        Loads the model state dictionary from the specified path and
+        sets the model to evaluation mode.
+        Args:
+            path: Path to the saved model file (.pt or .pth).
+        Raises:
+            FileNotFoundError: If the model file does not exist.
+            RuntimeError: If the state dict is incompatible with the model.
+        Example:
+            >>> controller = RNNMetaController()
+            >>> controller.load_model("/path/to/model.pt")
+        """
+        # Load state dict with appropriate device mapping
+        state_dict = torch.load(path, map_location=self.device, weights_only=True)
+        # Load into model
+        self.model.load_state_dict(state_dict)
+        # Ensure model is in evaluation mode
+        self.model.eval()
+    def save_model(self, path: str) -> None:
+        """
+        Save the current model to disk.
+        Saves the model state dictionary to the specified path.
+        Args:
+            path: Path where the model should be saved (.pt or .pth).
+        Example:
+            >>> controller = RNNMetaController()
+            >>> controller.save_model("/path/to/model.pt")
+        """
+        torch.save(self.model.state_dict(), path)
+    def reset_hidden_state(self) -> None:
+        """
+        Reset the hidden state for sequence tracking.
+        This method clears any accumulated hidden state, useful when
+        starting a new conversation or resetting the controller state.
+        Example:
+            >>> controller = RNNMetaController()
+            >>> controller.reset_hidden_state()
+            >>> controller.hidden_state is None
+            True
+        """
+        self.hidden_state = None

src/agents/meta_controller/utils.py ADDED Viewed

	@@ -0,0 +1,201 @@

+"""
+Utility functions for Neural Meta-Controller feature processing.
+Provides functions for normalizing, encoding, and converting features
+into formats suitable for different neural network architectures.
+"""
+import torch
+from src.agents.meta_controller.base import MetaControllerFeatures
+def normalize_features(features: MetaControllerFeatures) -> list[float]:
+    """
+    Normalize meta-controller features to a 10-dimensional vector in range [0, 1].
+    The normalization strategy:
+    - Confidence scores (hrm, trm, mcts_value, consensus): Already 0-1, clipped
+    - last_agent: Encoded as 3 one-hot values (hrm=0, trm=1, mcts=2)
+    - iteration: Normalized to 0-1 assuming max 20 iterations
+    - query_length: Normalized to 0-1 assuming max 10000 characters
+    - has_rag_context: Binary 0 or 1
+    Output vector structure (10 dimensions):
+    [hrm_conf, trm_conf, mcts_value, consensus, last_hrm, last_trm, last_mcts,
+     iteration_norm, query_length_norm, has_rag]
+    Args:
+        features: MetaControllerFeatures instance to normalize.
+    Returns:
+        List of 10 floats, each normalized to range [0, 1].
+    Example:
+        >>> features = MetaControllerFeatures(
+        ...     hrm_confidence=0.8,
+        ...     trm_confidence=0.6,
+        ...     mcts_value=0.75,
+        ...     consensus_score=0.7,
+        ...     last_agent='hrm',
+        ...     iteration=2,
+        ...     query_length=150,
+        ...     has_rag_context=True
+        ... )
+        >>> normalized = normalize_features(features)
+        >>> len(normalized)
+        10
+        >>> all(0.0 <= v <= 1.0 for v in normalized)
+        True
+    """
+    # Clip confidence scores to [0, 1]
+    hrm_conf = max(0.0, min(1.0, features.hrm_confidence))
+    trm_conf = max(0.0, min(1.0, features.trm_confidence))
+    mcts_val = max(0.0, min(1.0, features.mcts_value))
+    consensus = max(0.0, min(1.0, features.consensus_score))
+    # One-hot encode last_agent (3 dimensions)
+    last_agent_onehot = one_hot_encode_agent(features.last_agent)
+    # Normalize iteration (assuming max 20 iterations)
+    max_iterations = 20
+    iteration_norm = max(0.0, min(1.0, features.iteration / max_iterations))
+    # Normalize query length (assuming max 10000 characters)
+    max_query_length = 10000
+    query_length_norm = max(0.0, min(1.0, features.query_length / max_query_length))
+    # Binary for has_rag_context
+    has_rag = 1.0 if features.has_rag_context else 0.0
+    # Combine into 10-dimensional vector
+    return [
+        hrm_conf,
+        trm_conf,
+        mcts_val,
+        consensus,
+        last_agent_onehot[0],  # hrm
+        last_agent_onehot[1],  # trm
+        last_agent_onehot[2],  # mcts
+        iteration_norm,
+        query_length_norm,
+        has_rag,
+    ]
+def one_hot_encode_agent(agent: str) -> list[float]:
+    """
+    One-hot encode an agent name into a 3-dimensional vector.
+    Encoding:
+    - 'hrm' -> [1.0, 0.0, 0.0]
+    - 'trm' -> [0.0, 1.0, 0.0]
+    - 'mcts' -> [0.0, 0.0, 1.0]
+    - 'none' or other -> [0.0, 0.0, 0.0]
+    Args:
+        agent: Agent name string ('hrm', 'trm', 'mcts', or 'none').
+    Returns:
+        List of 3 floats representing the one-hot encoding.
+    Example:
+        >>> one_hot_encode_agent('hrm')
+        [1.0, 0.0, 0.0]
+        >>> one_hot_encode_agent('trm')
+        [0.0, 1.0, 0.0]
+        >>> one_hot_encode_agent('mcts')
+        [0.0, 0.0, 1.0]
+        >>> one_hot_encode_agent('none')
+        [0.0, 0.0, 0.0]
+    """
+    agent_lower = agent.lower()
+    if agent_lower == "hrm":  # noqa: SIM116
+        return [1.0, 0.0, 0.0]
+    elif agent_lower == "trm":
+        return [0.0, 1.0, 0.0]
+    elif agent_lower == "mcts":
+        return [0.0, 0.0, 1.0]
+    else:
+        # 'none' or unknown agent
+        return [0.0, 0.0, 0.0]
+def features_to_tensor(features: MetaControllerFeatures) -> torch.Tensor:
+    """
+    Convert meta-controller features to a PyTorch tensor.
+    Uses normalize_features internally to create a normalized 10-dimensional
+    tensor suitable for neural network input.
+    Args:
+        features: MetaControllerFeatures instance to convert.
+    Returns:
+        PyTorch tensor of shape (10,) with float32 dtype.
+    Example:
+        >>> features = MetaControllerFeatures(
+        ...     hrm_confidence=0.8,
+        ...     trm_confidence=0.6,
+        ...     mcts_value=0.75,
+        ...     consensus_score=0.7,
+        ...     last_agent='hrm',
+        ...     iteration=2,
+        ...     query_length=150,
+        ...     has_rag_context=True
+        ... )
+        >>> tensor = features_to_tensor(features)
+        >>> tensor.shape
+        torch.Size([10])
+        >>> tensor.dtype
+        torch.float32
+    """
+    normalized = normalize_features(features)
+    return torch.tensor(normalized, dtype=torch.float32)
+def features_to_text(features: MetaControllerFeatures) -> str:
+    """
+    Convert meta-controller features to structured text format.
+    Creates a human-readable text representation suitable for text-based
+    models like BERT or other language models.
+    Args:
+        features: MetaControllerFeatures instance to convert.
+    Returns:
+        Structured text string describing the features.
+    Example:
+        >>> features = MetaControllerFeatures(
+        ...     hrm_confidence=0.8,
+        ...     trm_confidence=0.6,
+        ...     mcts_value=0.75,
+        ...     consensus_score=0.7,
+        ...     last_agent='hrm',
+        ...     iteration=2,
+        ...     query_length=150,
+        ...     has_rag_context=True
+        ... )
+        >>> text = features_to_text(features)
+        >>> 'HRM confidence: 0.800' in text
+        True
+    """
+    rag_status = "available" if features.has_rag_context else "not available"
+    text = (
+        f"Agent State Features:\n"
+        f"HRM confidence: {features.hrm_confidence:.3f}\n"
+        f"TRM confidence: {features.trm_confidence:.3f}\n"
+        f"MCTS value: {features.mcts_value:.3f}\n"
+        f"Consensus score: {features.consensus_score:.3f}\n"
+        f"Last agent used: {features.last_agent}\n"
+        f"Current iteration: {features.iteration}\n"
+        f"Query length: {features.query_length} characters\n"
+        f"RAG context: {rag_status}"
+    )
+    return text

src/agents/trm_agent.py ADDED Viewed

	@@ -0,0 +1,395 @@

+"""
+Tiny Recursive Model (TRM) Agent.
+Implements recursive refinement with:
+- Deep supervision at all recursion levels
+- Convergence detection
+- Memory-efficient recursion
+- Iterative improvement mechanism
+Based on principles from:
+- "Recursive Refinement Networks"
+- "Deep Supervision for Neural Networks"
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+import torch
+import torch.nn as nn
+from ..training.system_config import TRMConfig
+@dataclass
+class TRMOutput:
+    """Output from TRM recursive processing."""
+    final_prediction: torch.Tensor  # Final refined output
+    intermediate_predictions: list[torch.Tensor]  # Predictions at each recursion
+    recursion_depth: int  # Actual depth used
+    converged: bool  # Whether convergence was achieved
+    convergence_step: int  # Step at which convergence occurred
+    residual_norms: list[float]  # L2 norms of residuals at each step
+class RecursiveBlock(nn.Module):
+    """
+    Core recursive processing block.
+    Applies the same transformation repeatedly, with residual connections.
+    """
+    def __init__(self, config: TRMConfig):
+        super().__init__()
+        self.config = config
+        # Main processing pathway
+        self.transform = nn.Sequential(
+            nn.Linear(config.latent_dim, config.hidden_dim),
+            nn.LayerNorm(config.hidden_dim) if config.use_layer_norm else nn.Identity(),
+            nn.GELU(),
+            nn.Dropout(config.dropout),
+            nn.Linear(config.hidden_dim, config.latent_dim),
+            nn.LayerNorm(config.latent_dim) if config.use_layer_norm else nn.Identity(),
+        )
+        # Residual scaling (learned)
+        self.residual_scale = nn.Parameter(torch.ones(1))
+    def forward(self, x: torch.Tensor, iteration: int = 0) -> torch.Tensor:  # noqa: ARG002
+        """
+        Apply recursive transformation.
+        Args:
+            x: Input tensor [batch, ..., latent_dim]
+            iteration: Current recursion iteration (reserved for future iteration-dependent behavior)
+        Returns:
+            Refined tensor [batch, ..., latent_dim]
+        """
+        # Residual connection with learned scaling
+        residual = self.transform(x)
+        return x + self.residual_scale * residual
+class DeepSupervisionHead(nn.Module):
+    """
+    Supervision head for intermediate predictions.
+    Enables training signal at each recursion level.
+    """
+    def __init__(self, latent_dim: int, output_dim: int):
+        super().__init__()
+        self.head = nn.Sequential(
+            nn.Linear(latent_dim, latent_dim // 2),
+            nn.ReLU(),
+            nn.Linear(latent_dim // 2, output_dim),
+        )
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Generate prediction from latent state."""
+        return self.head(x)
+class TRMAgent(nn.Module):
+    """
+    Tiny Recursive Model for iterative refinement.
+    Features:
+    - Shared weights across recursions (parameter efficiency)
+    - Deep supervision at all levels
+    - Automatic convergence detection
+    - Residual connections for stable gradients
+    """
+    def __init__(self, config: TRMConfig, output_dim: int | None = None, device: str = "cpu"):
+        super().__init__()
+        self.config = config
+        self.device = device
+        self.output_dim = output_dim or config.latent_dim
+        # Initial encoding
+        self.encoder = nn.Sequential(
+            nn.Linear(config.latent_dim, config.hidden_dim),
+            nn.LayerNorm(config.hidden_dim) if config.use_layer_norm else nn.Identity(),
+            nn.GELU(),
+            nn.Linear(config.hidden_dim, config.latent_dim),
+            nn.LayerNorm(config.latent_dim) if config.use_layer_norm else nn.Identity(),
+        )
+        # Shared recursive block
+        self.recursive_block = RecursiveBlock(config)
+        # Deep supervision heads (one per recursion level)
+        if config.deep_supervision:
+            self.supervision_heads = nn.ModuleList(
+                [DeepSupervisionHead(config.latent_dim, self.output_dim) for _ in range(config.num_recursions)]
+            )
+        else:
+            # Single output head
+            self.output_head = DeepSupervisionHead(config.latent_dim, self.output_dim)
+        self.to(device)
+    def forward(
+        self,
+        x: torch.Tensor,
+        num_recursions: int | None = None,
+        check_convergence: bool = True,
+    ) -> TRMOutput:
+        """
+        Process input through recursive refinement.
+        Args:
+            x: Input tensor [batch, ..., latent_dim]
+            num_recursions: Number of recursions (defaults to config)
+            check_convergence: Whether to check for early convergence
+        Returns:
+            TRMOutput with final and intermediate predictions
+        """
+        num_recursions = num_recursions or self.config.num_recursions
+        # Initial encoding
+        latent = self.encoder(x)
+        previous_latent = latent.clone()
+        # Tracking
+        intermediate_predictions = []
+        residual_norms = []
+        converged = False
+        convergence_step = num_recursions
+        # Recursive refinement
+        for i in range(num_recursions):
+            # Apply recursive transformation
+            latent = self.recursive_block(latent, iteration=i)
+            # Generate intermediate prediction
+            if self.config.deep_supervision and i < len(self.supervision_heads):
+                pred = self.supervision_heads[i](latent)
+            else:
+                pred = self.output_head(latent)
+            intermediate_predictions.append(pred)
+            # Check convergence
+            if check_convergence and i >= self.config.min_recursions:
+                residual = latent - previous_latent
+                residual_norm = torch.norm(residual, p=2, dim=-1).mean().item()
+                residual_norms.append(residual_norm)
+                if residual_norm < self.config.convergence_threshold:
+                    converged = True
+                    convergence_step = i + 1
+                    break
+            previous_latent = latent.clone()
+        # Final prediction
+        final_pred = intermediate_predictions[-1]
+        return TRMOutput(
+            final_prediction=final_pred,
+            intermediate_predictions=intermediate_predictions,
+            recursion_depth=len(intermediate_predictions),
+            converged=converged,
+            convergence_step=convergence_step,
+            residual_norms=residual_norms,
+        )
+    async def refine_solution(
+        self,
+        initial_prediction: torch.Tensor,
+        num_recursions: int | None = None,
+        convergence_threshold: float | None = None,
+    ) -> tuple[torch.Tensor, dict]:
+        """
+        Refine an initial prediction through recursive processing.
+        Args:
+            initial_prediction: Initial solution [batch, ..., latent_dim]
+            num_recursions: Maximum recursions (optional)
+            convergence_threshold: Convergence threshold (optional)
+        Returns:
+            refined_solution: Final refined prediction
+            info: Dictionary with refinement metadata
+        """
+        # Temporarily override convergence threshold if provided
+        original_threshold = self.config.convergence_threshold
+        if convergence_threshold is not None:
+            self.config.convergence_threshold = convergence_threshold
+        # Process
+        output = self.forward(
+            initial_prediction,
+            num_recursions=num_recursions,
+            check_convergence=True,
+        )
+        # Restore original threshold
+        self.config.convergence_threshold = original_threshold
+        info = {
+            "converged": output.converged,
+            "convergence_step": output.convergence_step,
+            "total_recursions": output.recursion_depth,
+            "final_residual": output.residual_norms[-1] if output.residual_norms else None,
+            "refinement_path": output.residual_norms,
+        }
+        return output.final_prediction, info
+    def get_parameter_count(self) -> int:
+        """Return total number of trainable parameters."""
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+class TRMLoss(nn.Module):
+    """
+    Deep supervision loss for TRM.
+    Applies weighted supervision at all recursion levels,
+    with exponential decay for deeper levels.
+    """
+    def __init__(
+        self,
+        task_loss_fn: nn.Module,
+        supervision_weight_decay: float = 0.5,
+        final_weight: float = 1.0,
+    ):
+        """
+        Initialize TRM loss.
+        Args:
+            task_loss_fn: Base loss function (e.g., MSE, CrossEntropy)
+            supervision_weight_decay: Decay factor for intermediate losses
+            final_weight: Weight for final prediction loss
+        """
+        super().__init__()
+        self.task_loss_fn = task_loss_fn
+        self.supervision_weight_decay = supervision_weight_decay
+        self.final_weight = final_weight
+    def forward(self, trm_output: TRMOutput, targets: torch.Tensor) -> tuple[torch.Tensor, dict]:
+        """
+        Compute deep supervision loss.
+        Args:
+            trm_output: Output from TRM forward pass
+            targets: Ground truth targets
+        Returns:
+            total_loss: Combined loss
+            loss_dict: Dictionary of loss components
+        """
+        # Final prediction loss (highest weight)
+        final_loss = self.task_loss_fn(trm_output.final_prediction, targets)
+        total_loss = self.final_weight * final_loss
+        # Intermediate supervision losses
+        intermediate_losses = []
+        num_intermediate = len(trm_output.intermediate_predictions) - 1
+        for i, pred in enumerate(trm_output.intermediate_predictions[:-1]):
+            # Exponential decay: earlier predictions get lower weight
+            weight = self.supervision_weight_decay ** (num_intermediate - i)
+            loss = self.task_loss_fn(pred, targets)
+            intermediate_losses.append(loss.item())
+            total_loss = total_loss + weight * loss
+        loss_dict = {
+            "total": total_loss.item(),
+            "final": final_loss.item(),
+            "intermediate_mean": (sum(intermediate_losses) / len(intermediate_losses) if intermediate_losses else 0.0),
+            "recursion_depth": trm_output.recursion_depth,
+            "converged": trm_output.converged,
+            "convergence_step": trm_output.convergence_step,
+        }
+        return total_loss, loss_dict
+def create_trm_agent(config: TRMConfig, output_dim: int | None = None, device: str = "cpu") -> TRMAgent:
+    """
+    Factory function to create and initialize TRM agent.
+    Args:
+        config: TRM configuration
+        output_dim: Output dimension (defaults to latent_dim)
+        device: Device to place model on
+    Returns:
+        Initialized TRMAgent
+    """
+    agent = TRMAgent(config, output_dim, device)
+    # Initialize weights with Xavier/He initialization
+    def init_weights(m):
+        if isinstance(m, nn.Linear):
+            nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
+            if m.bias is not None:
+                nn.init.zeros_(m.bias)
+    agent.apply(init_weights)
+    return agent
+# Utility functions for integration
+class TRMRefinementWrapper:
+    """
+    Wrapper for using TRM as a refinement step in pipelines.
+    Provides a clean interface for integrating TRM into larger systems.
+    """
+    def __init__(self, trm_agent: TRMAgent, device: str = "cpu"):
+        self.trm_agent = trm_agent
+        self.device = device
+        self.trm_agent.eval()
+    @torch.no_grad()
+    async def refine(
+        self,
+        predictions: torch.Tensor,
+        num_iterations: int = 10,
+        return_path: bool = False,
+    ) -> torch.Tensor | tuple[torch.Tensor, list[torch.Tensor]]:
+        """
+        Refine predictions using TRM.
+        Args:
+            predictions: Initial predictions to refine
+            num_iterations: Number of refinement iterations
+            return_path: Whether to return intermediate predictions
+        Returns:
+            refined_predictions or (refined_predictions, refinement_path)
+        """
+        # Ensure predictions are on correct device
+        predictions = predictions.to(self.device)
+        # Run TRM
+        output = self.trm_agent(predictions, num_recursions=num_iterations, check_convergence=True)
+        if return_path:
+            return output.final_prediction, output.intermediate_predictions
+        return output.final_prediction
+    def get_refinement_stats(self, predictions: torch.Tensor) -> dict:
+        """Get statistics about the refinement process."""
+        with torch.no_grad():
+            output = self.trm_agent(predictions, check_convergence=True)
+        return {
+            "converged": output.converged,
+            "steps_to_convergence": output.convergence_step,
+            "final_residual": (output.residual_norms[-1] if output.residual_norms else None),
+            "total_refinement_iterations": output.recursion_depth,
+        }

src/api/__init__.py ADDED Viewed

	@@ -0,0 +1,35 @@

+"""
+API module for LangGraph Multi-Agent MCTS Framework.
+Provides:
+- Authentication and authorization
+- Rate limiting
+- Error handling
+- REST API endpoints
+"""
+from src.api.exceptions import (
+    AuthenticationError,
+    AuthorizationError,
+    ConfigurationError,
+    FrameworkError,
+    LLMError,
+    MCTSError,
+    RAGError,
+    RateLimitError,
+    TimeoutError,
+    ValidationError,
+)
+__all__ = [
+    "FrameworkError",
+    "ValidationError",
+    "AuthenticationError",
+    "AuthorizationError",
+    "RateLimitError",
+    "LLMError",
+    "MCTSError",
+    "RAGError",
+    "TimeoutError",
+    "ConfigurationError",
+]

src/api/auth.py ADDED Viewed

	@@ -0,0 +1,439 @@

+"""
+Authentication and authorization layer for LangGraph Multi-Agent MCTS Framework.
+Provides:
+- API key authentication with secure hashing
+- JWT token support (optional)
+- Rate limiting per client
+- Role-based access control
+"""
+import hashlib
+import secrets
+import time
+from collections import defaultdict
+from dataclasses import dataclass, field
+from datetime import datetime, timedelta
+from src.api.exceptions import (
+    AuthenticationError,
+    AuthorizationError,
+    RateLimitError,
+)
+@dataclass
+class ClientInfo:
+    """Information about an authenticated client."""
+    client_id: str
+    roles: set[str] = field(default_factory=lambda: {"user"})
+    created_at: datetime = field(default_factory=datetime.utcnow)
+    last_access: datetime = field(default_factory=datetime.utcnow)
+    request_count: int = 0
+@dataclass
+class RateLimitConfig:
+    """Rate limiting configuration."""
+    requests_per_minute: int = 60
+    requests_per_hour: int = 1000
+    requests_per_day: int = 10000
+    burst_limit: int = 100  # Max requests in 1 second
+class APIKeyAuthenticator:
+    """
+    API key-based authentication with secure hashing.
+    Keys are stored as SHA-256 hashes to prevent exposure.
+    """
+    def __init__(
+        self,
+        valid_keys: list[str] | None = None,
+        rate_limit_config: RateLimitConfig | None = None,
+    ):
+        """
+        Initialize authenticator.
+        Args:
+            valid_keys: List of valid API keys (will be hashed)
+            rate_limit_config: Rate limiting configuration
+        """
+        self._key_to_client: dict[str, ClientInfo] = {}
+        self._rate_limits: dict[str, list[float]] = defaultdict(list)
+        self.rate_limit_config = rate_limit_config or RateLimitConfig()
+        # Hash and store initial keys
+        if valid_keys:
+            for i, key in enumerate(valid_keys):
+                client_id = f"client_{i}"
+                self._add_key(key, client_id)
+    def _hash_key(self, api_key: str) -> str:
+        """
+        Securely hash an API key.
+        Uses SHA-256 with consistent encoding.
+        """
+        return hashlib.sha256(api_key.encode("utf-8")).hexdigest()
+    def _add_key(self, api_key: str, client_id: str, roles: set[str] | None = None) -> None:
+        """
+        Add a new API key.
+        Args:
+            api_key: Raw API key
+            client_id: Client identifier
+            roles: Set of roles (defaults to {"user"})
+        """
+        key_hash = self._hash_key(api_key)
+        self._key_to_client[key_hash] = ClientInfo(
+            client_id=client_id,
+            roles=roles or {"user"},
+        )
+    def authenticate(self, api_key: str | None) -> ClientInfo:
+        """
+        Authenticate an API key.
+        Args:
+            api_key: API key to validate
+        Returns:
+            ClientInfo for the authenticated client
+        Raises:
+            AuthenticationError: If authentication fails
+        """
+        if not api_key:
+            raise AuthenticationError(
+                user_message="API key is required",
+                internal_details="No API key provided in request",
+            )
+        # Constant-time comparison to prevent timing attacks
+        key_hash = self._hash_key(api_key)
+        if key_hash not in self._key_to_client:
+            raise AuthenticationError(
+                user_message="Invalid API key",
+                internal_details=f"API key hash not found: {key_hash[:16]}...",
+            )
+        client_info = self._key_to_client[key_hash]
+        client_info.last_access = datetime.utcnow()
+        client_info.request_count += 1
+        # Check rate limits
+        self._check_rate_limit(client_info.client_id)
+        return client_info
+    def _check_rate_limit(self, client_id: str) -> None:
+        """
+        Check if client has exceeded rate limits.
+        Args:
+            client_id: Client identifier
+        Raises:
+            RateLimitError: If rate limit exceeded
+        """
+        now = time.time()
+        request_times = self._rate_limits[client_id]
+        # Clean old entries
+        one_day_ago = now - 86400
+        request_times = [t for t in request_times if t > one_day_ago]
+        self._rate_limits[client_id] = request_times
+        # Check burst limit (1 second window)
+        one_second_ago = now - 1
+        burst_count = sum(1 for t in request_times if t > one_second_ago)
+        if burst_count >= self.rate_limit_config.burst_limit:
+            raise RateLimitError(
+                user_message="Too many requests. Please slow down.",
+                internal_details=f"Client {client_id} exceeded burst limit: {burst_count}/{self.rate_limit_config.burst_limit}",
+                retry_after_seconds=1,
+            )
+        # Check per-minute limit
+        one_minute_ago = now - 60
+        minute_count = sum(1 for t in request_times if t > one_minute_ago)
+        if minute_count >= self.rate_limit_config.requests_per_minute:
+            raise RateLimitError(
+                user_message="Rate limit exceeded. Please wait a minute.",
+                internal_details=f"Client {client_id} exceeded minute limit: {minute_count}/{self.rate_limit_config.requests_per_minute}",
+                retry_after_seconds=60,
+            )
+        # Check per-hour limit
+        one_hour_ago = now - 3600
+        hour_count = sum(1 for t in request_times if t > one_hour_ago)
+        if hour_count >= self.rate_limit_config.requests_per_hour:
+            raise RateLimitError(
+                user_message="Hourly rate limit exceeded. Please try again later.",
+                internal_details=f"Client {client_id} exceeded hour limit: {hour_count}/{self.rate_limit_config.requests_per_hour}",
+                retry_after_seconds=3600,
+            )
+        # Check per-day limit
+        day_count = len(request_times)
+        if day_count >= self.rate_limit_config.requests_per_day:
+            raise RateLimitError(
+                user_message="Daily rate limit exceeded. Please try again tomorrow.",
+                internal_details=f"Client {client_id} exceeded day limit: {day_count}/{self.rate_limit_config.requests_per_day}",
+                retry_after_seconds=86400,
+            )
+        # Record this request
+        request_times.append(now)
+    def require_auth(self, api_key: str | None) -> ClientInfo:
+        """
+        Require authentication for a request.
+        Convenience method that raises on failure.
+        Args:
+            api_key: API key to validate
+        Returns:
+            ClientInfo for authenticated client
+        Raises:
+            AuthenticationError: If authentication fails
+        """
+        return self.authenticate(api_key)
+    def require_role(self, client_info: ClientInfo, required_role: str) -> None:
+        """
+        Require a specific role for an operation.
+        Args:
+            client_info: Authenticated client info
+            required_role: Role that is required
+        Raises:
+            AuthorizationError: If client doesn't have required role
+        """
+        if required_role not in client_info.roles:
+            raise AuthorizationError(
+                user_message="You do not have permission for this operation",
+                internal_details=f"Client {client_info.client_id} missing role: {required_role}",
+                required_permission=required_role,
+            )
+    def generate_api_key(self) -> str:
+        """
+        Generate a secure random API key.
+        Returns:
+            New API key (32 bytes hex = 64 characters)
+        """
+        return secrets.token_hex(32)
+    def revoke_key(self, api_key: str) -> bool:
+        """
+        Revoke an API key.
+        Args:
+            api_key: Key to revoke
+        Returns:
+            True if key was revoked, False if not found
+        """
+        key_hash = self._hash_key(api_key)
+        if key_hash in self._key_to_client:
+            del self._key_to_client[key_hash]
+            return True
+        return False
+    def add_client(
+        self,
+        client_id: str,
+        roles: set[str] | None = None,
+    ) -> str:
+        """
+        Add a new client and generate their API key.
+        Args:
+            client_id: Unique client identifier
+            roles: Set of roles for the client
+        Returns:
+            Generated API key (save this securely!)
+        """
+        api_key = self.generate_api_key()
+        self._add_key(api_key, client_id, roles)
+        return api_key
+    def get_client_stats(self, client_id: str) -> dict:
+        """
+        Get statistics for a client.
+        Args:
+            client_id: Client identifier
+        Returns:
+            Dictionary with client statistics
+        """
+        now = time.time()
+        request_times = self._rate_limits.get(client_id, [])
+        return {
+            "total_requests_today": len([t for t in request_times if t > now - 86400]),
+            "requests_last_hour": len([t for t in request_times if t > now - 3600]),
+            "requests_last_minute": len([t for t in request_times if t > now - 60]),
+        }
+class JWTAuthenticator:
+    """
+    JWT token-based authentication.
+    Note: Requires PyJWT library for full functionality.
+    This is a placeholder for JWT support.
+    """
+    def __init__(self, secret_key: str, algorithm: str = "HS256"):
+        """
+        Initialize JWT authenticator.
+        Args:
+            secret_key: Secret key for signing tokens
+            algorithm: JWT signing algorithm
+        """
+        self.secret_key = secret_key
+        self.algorithm = algorithm
+        self._token_blacklist: set[str] = set()
+    def create_token(
+        self,
+        client_id: str,
+        roles: set[str],
+        expires_in_hours: int = 24,
+    ) -> str:
+        """
+        Create a JWT token.
+        Args:
+            client_id: Client identifier
+            roles: Client roles
+            expires_in_hours: Token validity period
+        Returns:
+            JWT token string
+        """
+        try:
+            import jwt
+        except ImportError:
+            raise ImportError("PyJWT library required for JWT authentication. Install with: pip install PyJWT")
+        now = datetime.utcnow()
+        payload = {
+            "sub": client_id,
+            "roles": list(roles),
+            "iat": now,
+            "exp": now + timedelta(hours=expires_in_hours),
+            "jti": secrets.token_hex(16),  # Unique token ID
+        }
+        return jwt.encode(payload, self.secret_key, algorithm=self.algorithm)
+    def verify_token(self, token: str) -> ClientInfo:
+        """
+        Verify a JWT token.
+        Args:
+            token: JWT token string
+        Returns:
+            ClientInfo from token claims
+        Raises:
+            AuthenticationError: If token is invalid
+        """
+        try:
+            import jwt
+        except ImportError:
+            raise ImportError("PyJWT library required for JWT authentication")
+        if token in self._token_blacklist:
+            raise AuthenticationError(
+                user_message="Token has been revoked",
+                internal_details="Token found in blacklist",
+            )
+        try:
+            payload = jwt.decode(
+                token,
+                self.secret_key,
+                algorithms=[self.algorithm],
+            )
+            return ClientInfo(
+                client_id=payload["sub"],
+                roles=set(payload.get("roles", ["user"])),
+            )
+        except jwt.ExpiredSignatureError:
+            raise AuthenticationError(
+                user_message="Token has expired",
+                internal_details="JWT signature expired",
+            )
+        except jwt.InvalidTokenError as e:
+            raise AuthenticationError(
+                user_message="Invalid token",
+                internal_details=f"JWT validation failed: {str(e)}",
+            )
+    def revoke_token(self, token: str) -> None:
+        """
+        Revoke a JWT token by adding to blacklist.
+        Args:
+            token: Token to revoke
+        """
+        self._token_blacklist.add(token)
+# Default authenticator instance
+_default_authenticator: APIKeyAuthenticator | None = None
+def get_authenticator() -> APIKeyAuthenticator:
+    """
+    Get or create the default authenticator instance.
+    Returns:
+        APIKeyAuthenticator instance
+    """
+    global _default_authenticator
+    if _default_authenticator is None:
+        _default_authenticator = APIKeyAuthenticator()
+    return _default_authenticator
+def set_authenticator(authenticator: APIKeyAuthenticator) -> None:
+    """
+    Set the default authenticator instance.
+    Args:
+        authenticator: Authenticator to use
+    """
+    global _default_authenticator
+    _default_authenticator = authenticator
+# Exports
+__all__ = [
+    "APIKeyAuthenticator",
+    "JWTAuthenticator",
+    "ClientInfo",
+    "RateLimitConfig",
+    "get_authenticator",
+    "set_authenticator",
+]

src/api/exceptions.py ADDED Viewed

	@@ -0,0 +1,299 @@

+"""
+Custom exception hierarchy for LangGraph Multi-Agent MCTS Framework.
+Provides:
+- Sanitized error messages for production
+- Structured error information for logging
+- Clear separation between user-facing and internal errors
+"""
+import re
+from datetime import datetime
+from typing import Any
+class FrameworkError(Exception):
+    """
+    Base exception for all framework errors.
+    Provides sanitized user-facing messages while preserving
+    internal details for logging.
+    """
+    def __init__(
+        self,
+        user_message: str,
+        internal_details: str | None = None,
+        error_code: str | None = None,
+        context: dict[str, Any] | None = None,
+    ):
+        """
+        Initialize framework error.
+        Args:
+            user_message: Safe message to show to users
+            internal_details: Detailed information for logs (may contain sensitive data)
+            error_code: Machine-readable error code
+            context: Additional context for debugging
+        """
+        self.user_message = user_message
+        self.internal_details = internal_details or user_message
+        self.error_code = error_code or self.__class__.__name__.upper()
+        self.context = context or {}
+        self.timestamp = datetime.utcnow()
+        super().__init__(user_message)
+    def sanitize_details(self) -> str:
+        """
+        Remove sensitive information from internal details.
+        Sanitizes:
+        - File paths
+        - API keys
+        - Passwords
+        - Connection strings
+        - IP addresses
+        """
+        sanitized = self.internal_details
+        # Remove file paths (Unix and Windows)
+        sanitized = re.sub(r"/[\w/.-]+", "/***", sanitized)
+        sanitized = re.sub(r"[A-Za-z]:\\[\w\\.-]+", "C:\\***", sanitized)
+        # Remove API keys and secrets
+        sanitized = re.sub(
+            r"(api[_-]?key|secret|password|token|credential)[\s=:]+[\S]+", r"\1=***", sanitized, flags=re.IGNORECASE
+        )
+        # Remove connection strings
+        sanitized = re.sub(r"(mongodb|postgresql|mysql|redis)://[^\s]+", r"\1://***", sanitized, flags=re.IGNORECASE)
+        # Remove IP addresses
+        sanitized = re.sub(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b", "***.***.***", sanitized)
+        # Remove email addresses
+        sanitized = re.sub(r"\b[\w.-]+@[\w.-]+\.\w+\b", "***@***", sanitized)
+        return sanitized
+    def to_log_dict(self) -> dict[str, Any]:
+        """
+        Convert exception to dictionary for structured logging.
+        Returns sanitized version safe for logs.
+        """
+        return {
+            "error_type": self.__class__.__name__,
+            "error_code": self.error_code,
+            "user_message": self.user_message,
+            "sanitized_details": self.sanitize_details(),
+            "timestamp": self.timestamp.isoformat(),
+            "context": {k: str(v) for k, v in self.context.items()},
+        }
+    def to_user_response(self) -> dict[str, Any]:
+        """
+        Convert exception to safe user-facing response.
+        """
+        return {
+            "error": True,
+            "error_code": self.error_code,
+            "message": self.user_message,
+            "timestamp": self.timestamp.isoformat(),
+        }
+class ValidationError(FrameworkError):
+    """Raised when input validation fails."""
+    def __init__(
+        self,
+        user_message: str = "Invalid input provided",
+        internal_details: str | None = None,
+        field_name: str | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if field_name:
+            context["field_name"] = field_name
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="VALIDATION_ERROR",
+            context=context,
+            **kwargs,
+        )
+        self.field_name = field_name
+class AuthenticationError(FrameworkError):
+    """Raised when authentication fails."""
+    def __init__(self, user_message: str = "Authentication failed", internal_details: str | None = None, **kwargs):
+        super().__init__(
+            user_message=user_message, internal_details=internal_details, error_code="AUTH_ERROR", **kwargs
+        )
+class AuthorizationError(FrameworkError):
+    """Raised when authorization fails."""
+    def __init__(
+        self,
+        user_message: str = "Access denied",
+        internal_details: str | None = None,
+        required_permission: str | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if required_permission:
+            context["required_permission"] = required_permission
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="AUTHZ_ERROR",
+            context=context,
+            **kwargs,
+        )
+class RateLimitError(FrameworkError):
+    """Raised when rate limit is exceeded."""
+    def __init__(
+        self,
+        user_message: str = "Rate limit exceeded. Please try again later.",
+        internal_details: str | None = None,
+        retry_after_seconds: int | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if retry_after_seconds:
+            context["retry_after_seconds"] = retry_after_seconds
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="RATE_LIMIT",
+            context=context,
+            **kwargs,
+        )
+        self.retry_after_seconds = retry_after_seconds
+class LLMError(FrameworkError):
+    """Raised when LLM operations fail."""
+    def __init__(
+        self,
+        user_message: str = "Language model service temporarily unavailable",
+        internal_details: str | None = None,
+        provider: str | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if provider:
+            context["provider"] = provider
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="LLM_ERROR",
+            context=context,
+            **kwargs,
+        )
+class MCTSError(FrameworkError):
+    """Raised when MCTS simulation fails."""
+    def __init__(
+        self,
+        user_message: str = "Tactical simulation failed",
+        internal_details: str | None = None,
+        iteration: int | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if iteration is not None:
+            context["iteration"] = iteration
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="MCTS_ERROR",
+            context=context,
+            **kwargs,
+        )
+class RAGError(FrameworkError):
+    """Raised when RAG retrieval fails."""
+    def __init__(self, user_message: str = "Context retrieval failed", internal_details: str | None = None, **kwargs):
+        super().__init__(user_message=user_message, internal_details=internal_details, error_code="RAG_ERROR", **kwargs)
+class TimeoutError(FrameworkError):
+    """Raised when operation times out."""
+    def __init__(
+        self,
+        user_message: str = "Operation timed out",
+        internal_details: str | None = None,
+        operation: str | None = None,
+        timeout_seconds: float | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if operation:
+            context["operation"] = operation
+        if timeout_seconds:
+            context["timeout_seconds"] = timeout_seconds
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="TIMEOUT",
+            context=context,
+            **kwargs,
+        )
+class ConfigurationError(FrameworkError):
+    """Raised when configuration is invalid."""
+    def __init__(
+        self,
+        user_message: str = "System configuration error",
+        internal_details: str | None = None,
+        config_key: str | None = None,
+        **kwargs,
+    ):
+        context = kwargs.pop("context", {})
+        if config_key:
+            context["config_key"] = config_key
+        super().__init__(
+            user_message=user_message,
+            internal_details=internal_details,
+            error_code="CONFIG_ERROR",
+            context=context,
+            **kwargs,
+        )
+# Convenience function for wrapping exceptions
+def wrap_exception(
+    exc: Exception, user_message: str = "An unexpected error occurred", error_class: type = FrameworkError, **kwargs
+) -> FrameworkError:
+    """
+    Wrap a standard exception in a FrameworkError with sanitized details.
+    Args:
+        exc: Original exception
+        user_message: Safe user-facing message
+        error_class: FrameworkError subclass to use
+        **kwargs: Additional context
+    Returns:
+        FrameworkError instance with sanitized details
+    """
+    internal_details = f"{type(exc).__name__}: {str(exc)}"
+    return error_class(user_message=user_message, internal_details=internal_details, **kwargs)

src/api/inference_server.py ADDED Viewed

	@@ -0,0 +1,380 @@

+"""
+FastAPI Inference Server for LangGraph Multi-Agent MCTS.
+Provides REST API for:
+- Problem solving with HRM+MCTS+TRM
+- Policy-value network inference
+- Health checks and monitoring
+"""
+import time
+from typing import Any
+import torch
+import uvicorn
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel, Field
+from ..framework.mcts.neural_mcts import NeuralMCTS
+from ..training.performance_monitor import PerformanceMonitor
+from ..training.system_config import SystemConfig
+# Request/Response Models
+class InferenceRequest(BaseModel):
+    """Request for problem inference."""
+    state: list[list[float]]  # State representation
+    query: str | None = "Solve this problem"
+    max_thinking_time: float = Field(default=10.0, ge=0.1, le=60.0)
+    use_mcts: bool = True
+    num_simulations: int | None = None
+    use_hrm_decomposition: bool = False
+    use_trm_refinement: bool = False
+    temperature: float = Field(default=0.1, ge=0.0, le=2.0)
+class PolicyValueRequest(BaseModel):
+    """Request for policy-value evaluation."""
+    state: list[list[float]]  # State representation
+class InferenceResponse(BaseModel):
+    """Response with inference results."""
+    success: bool
+    action_probabilities: dict[str, float] | None = None
+    best_action: str | None = None
+    value_estimate: float | None = None
+    subproblems: list[dict[str, Any]] | None = None
+    refinement_info: dict[str, Any] | None = None
+    performance_stats: dict[str, float]
+    error: str | None = None
+class PolicyValueResponse(BaseModel):
+    """Response with policy-value predictions."""
+    policy_probs: list[float]
+    value: float
+    inference_time_ms: float
+class HealthResponse(BaseModel):
+    """Health check response."""
+    status: str
+    device: str
+    model_loaded: bool
+    gpu_available: bool
+    gpu_memory_gb: float | None = None
+    uptime_seconds: float
+# Inference Server
+class InferenceServer:
+    """
+    Production inference server with comprehensive features.
+    Features:
+    - FastAPI REST endpoints
+    - Performance monitoring
+    - Health checks
+    - CORS support
+    - Error handling
+    """
+    def __init__(
+        self,
+        checkpoint_path: str,
+        config: SystemConfig | None = None,
+        host: str = "0.0.0.0",
+        port: int = 8000,
+    ):
+        """
+        Initialize inference server.
+        Args:
+            checkpoint_path: Path to model checkpoint
+            config: System configuration (loaded from checkpoint if None)
+            host: Server host
+            port: Server port
+        """
+        self.checkpoint_path = checkpoint_path
+        self.host = host
+        self.port = port
+        self.start_time = time.time()
+        # Load models
+        self.config, self.models = self._load_models(checkpoint_path, config)
+        self.device = self.config.device
+        # Performance monitoring
+        self.monitor = PerformanceMonitor(window_size=100, enable_gpu_monitoring=(self.device != "cpu"))
+        # Setup FastAPI app
+        self.app = FastAPI(
+            title="LangGraph Multi-Agent MCTS API",
+            description="Neural-guided MCTS with HRM and TRM agents",
+            version="1.0.0",
+        )
+        # CORS middleware
+        self.app.add_middleware(
+            CORSMiddleware,
+            allow_origins=["*"],
+            allow_credentials=True,
+            allow_methods=["*"],
+            allow_headers=["*"],
+        )
+        # Setup routes
+        self._setup_routes()
+    def _load_models(
+        self, checkpoint_path: str, config: SystemConfig | None
+    ) -> tuple[SystemConfig, dict[str, torch.nn.Module]]:
+        """Load models from checkpoint."""
+        print(f"Loading models from {checkpoint_path}...")
+        checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=True)
+        # Load config
+        if config is None:
+            config_dict = checkpoint.get("config", {})
+            config = SystemConfig.from_dict(config_dict)
+        device = config.device
+        # Load models
+        models = {}
+        # Policy-Value Network
+        from ..models.policy_value_net import create_policy_value_network
+        models["policy_value_net"] = create_policy_value_network(config.neural_net, board_size=19, device=device)
+        models["policy_value_net"].load_state_dict(checkpoint["policy_value_net"])
+        models["policy_value_net"].eval()
+        # HRM Agent
+        from ..agents.hrm_agent import create_hrm_agent
+        models["hrm_agent"] = create_hrm_agent(config.hrm, device)
+        models["hrm_agent"].load_state_dict(checkpoint["hrm_agent"])
+        models["hrm_agent"].eval()
+        # TRM Agent
+        from ..agents.trm_agent import create_trm_agent
+        models["trm_agent"] = create_trm_agent(config.trm, output_dim=config.neural_net.action_size, device=device)
+        models["trm_agent"].load_state_dict(checkpoint["trm_agent"])
+        models["trm_agent"].eval()
+        # MCTS
+        models["mcts"] = NeuralMCTS(
+            policy_value_network=models["policy_value_net"],
+            config=config.mcts,
+            device=device,
+        )
+        print(f"✓ Models loaded successfully on {device}")
+        return config, models
+    def _setup_routes(self):
+        """Setup API routes."""
+        @self.app.get("/", response_model=dict[str, str])
+        async def root():
+            """Root endpoint."""
+            return {
+                "message": "LangGraph Multi-Agent MCTS API",
+                "version": "1.0.0",
+                "docs": "/docs",
+            }
+        @self.app.get("/health", response_model=HealthResponse)
+        async def health():
+            """Health check endpoint."""
+            gpu_memory = None
+            if torch.cuda.is_available():
+                gpu_memory = torch.cuda.memory_allocated() / (1024**3)
+            return HealthResponse(
+                status="healthy",
+                device=self.device,
+                model_loaded=True,
+                gpu_available=torch.cuda.is_available(),
+                gpu_memory_gb=gpu_memory,
+                uptime_seconds=time.time() - self.start_time,
+            )
+        @self.app.post("/inference", response_model=InferenceResponse)
+        async def inference(request: InferenceRequest):
+            """
+            Main inference endpoint.
+            Processes a problem using the full pipeline:
+            1. Optional HRM decomposition
+            2. MCTS search
+            3. Optional TRM refinement
+            """
+            try:
+                start_time = time.perf_counter()
+                # Convert state to tensor
+                state_tensor = torch.tensor(request.state, dtype=torch.float32).unsqueeze(0)
+                state_tensor = state_tensor.to(self.device)
+                results = {}
+                # HRM Decomposition (if requested)
+                if request.use_hrm_decomposition:
+                    with torch.no_grad():
+                        hrm_output = self.models["hrm_agent"](state_tensor)
+                        results["subproblems"] = [
+                            {
+                                "level": sp.level,
+                                "description": sp.description,
+                                "confidence": sp.confidence,
+                            }
+                            for sp in hrm_output.subproblems
+                        ]
+                # MCTS Search (if requested)
+                if request.use_mcts:
+                    # Note: This is a simplified version
+                    # In production, you'd need to convert request.state to GameState
+                    results["action_probabilities"] = {"action_0": 0.5, "action_1": 0.3, "action_2": 0.2}
+                    results["best_action"] = "action_0"
+                    results["value_estimate"] = 0.75
+                # TRM Refinement (if requested)
+                if request.use_trm_refinement and results.get("best_action"):
+                    with torch.no_grad():
+                        # Simplified: just run TRM on the state
+                        trm_output = self.models["trm_agent"](state_tensor)
+                        results["refinement_info"] = {
+                            "converged": trm_output.converged,
+                            "convergence_step": trm_output.convergence_step,
+                            "recursion_depth": trm_output.recursion_depth,
+                        }
+                # Performance stats
+                elapsed_ms = (time.perf_counter() - start_time) * 1000
+                self.monitor.log_inference(elapsed_ms)
+                perf_stats = {
+                    "inference_time_ms": elapsed_ms,
+                    "device": self.device,
+                }
+                return InferenceResponse(
+                    success=True,
+                    action_probabilities=results.get("action_probabilities"),
+                    best_action=results.get("best_action"),
+                    value_estimate=results.get("value_estimate"),
+                    subproblems=results.get("subproblems"),
+                    refinement_info=results.get("refinement_info"),
+                    performance_stats=perf_stats,
+                )
+            except Exception as e:
+                raise HTTPException(status_code=500, detail=f"Inference failed: {str(e)}")
+        @self.app.post("/policy-value", response_model=PolicyValueResponse)
+        async def policy_value(request: PolicyValueRequest):
+            """
+            Get policy and value predictions for a state.
+            This is a direct neural network evaluation without MCTS.
+            """
+            try:
+                start_time = time.perf_counter()
+                # Convert state to tensor
+                state_tensor = torch.tensor(request.state, dtype=torch.float32).unsqueeze(0)
+                state_tensor = state_tensor.to(self.device)
+                # Get predictions
+                with torch.no_grad():
+                    policy_log_probs, value = self.models["policy_value_net"](state_tensor)
+                    policy_probs = torch.exp(policy_log_probs).squeeze(0)
+                elapsed_ms = (time.perf_counter() - start_time) * 1000
+                return PolicyValueResponse(
+                    policy_probs=policy_probs.cpu().tolist(),
+                    value=value.item(),
+                    inference_time_ms=elapsed_ms,
+                )
+            except Exception as e:
+                raise HTTPException(status_code=500, detail=f"Policy-value inference failed: {str(e)}")
+        @self.app.get("/stats")
+        async def stats():
+            """Get performance statistics."""
+            return self.monitor.get_stats()
+        @self.app.post("/reset-stats")
+        async def reset_stats():
+            """Reset performance statistics."""
+            self.monitor.reset()
+            return {"message": "Statistics reset successfully"}
+    def run(self):
+        """Start the inference server."""
+        print(f"\n{'=' * 80}")
+        print("Starting LangGraph Multi-Agent MCTS Inference Server")
+        print(f"{'=' * 80}")
+        print(f"Host: {self.host}:{self.port}")
+        print(f"Device: {self.device}")
+        print(f"Checkpoint: {self.checkpoint_path}")
+        print(f"{'=' * 80}\n")
+        uvicorn.run(self.app, host=self.host, port=self.port)
+def main():
+    """Main entry point for inference server."""
+    import argparse
+    parser = argparse.ArgumentParser(description="LangGraph MCTS Inference Server")
+    parser.add_argument(
+        "--checkpoint",
+        type=str,
+        required=True,
+        help="Path to model checkpoint",
+    )
+    parser.add_argument("--host", type=str, default="0.0.0.0", help="Server host")
+    parser.add_argument("--port", type=int, default=8000, help="Server port")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default=None,
+        help="Device (cpu, cuda, mps)",
+    )
+    args = parser.parse_args()
+    # Load config and override device if specified
+    config = None
+    if args.device:
+        config = SystemConfig()
+        config.device = args.device
+    server = InferenceServer(
+        checkpoint_path=args.checkpoint,
+        config=config,
+        host=args.host,
+        port=args.port,
+    )
+    server.run()
+if __name__ == "__main__":
+    main()

src/api/rest_server.py ADDED Viewed

	@@ -0,0 +1,441 @@

+"""
+Production REST API server for LangGraph Multi-Agent MCTS Framework.
+Provides:
+- OpenAPI/Swagger documentation
+- Authentication via API keys
+- Rate limiting
+- Health and readiness endpoints
+- Request validation with Pydantic
+- Prometheus metrics exposure
+"""
+import asyncio
+import time
+from contextlib import asynccontextmanager
+from datetime import datetime
+from typing import Any
+from fastapi import Depends, FastAPI, Header, HTTPException, Request, Response
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel, Field
+# Import framework components
+try:
+    from src.adapters.llm import create_client  # noqa: F401
+    from src.api.auth import (
+        APIKeyAuthenticator,
+        ClientInfo,
+        RateLimitConfig,
+        get_authenticator,
+        set_authenticator,
+    )
+    from src.api.exceptions import (
+        AuthenticationError,
+        AuthorizationError,  # noqa: F401
+        FrameworkError,
+        RateLimitError,
+        ValidationError,  # noqa: F401
+    )
+    from src.models.validation import MCTSConfig, QueryInput  # noqa: F401
+    IMPORTS_AVAILABLE = True
+except ImportError as e:
+    IMPORTS_AVAILABLE = False
+    import_error = str(e)
+# Prometheus metrics (optional)
+try:
+    from prometheus_client import CONTENT_TYPE_LATEST, Counter, Gauge, Histogram, generate_latest
+    PROMETHEUS_AVAILABLE = True
+    # Define metrics
+    REQUEST_COUNT = Counter("mcts_requests_total", "Total number of requests", ["method", "endpoint", "status"])
+    REQUEST_LATENCY = Histogram("mcts_request_duration_seconds", "Request latency in seconds", ["method", "endpoint"])
+    ACTIVE_REQUESTS = Gauge("mcts_active_requests", "Number of active requests")
+    ERROR_COUNT = Counter("mcts_errors_total", "Total number of errors", ["error_type"])
+except ImportError:
+    PROMETHEUS_AVAILABLE = False
+# Request/Response Models
+class QueryRequest(BaseModel):
+    """Request model for query processing."""
+    query: str = Field(
+        ...,
+        min_length=1,
+        max_length=10000,
+        description="User query to process",
+        json_schema_extra={"example": "Recommend defensive positions for night attack scenario"},
+    )
+    use_mcts: bool = Field(default=True, description="Enable MCTS tactical simulation")
+    use_rag: bool = Field(default=True, description="Enable RAG context retrieval")
+    mcts_iterations: int | None = Field(default=None, ge=1, le=10000, description="Override default MCTS iterations")
+    thread_id: str | None = Field(
+        default=None,
+        max_length=100,
+        pattern=r"^[a-zA-Z0-9_-]+$",
+        description="Conversation thread ID for state persistence",
+    )
+    class Config:
+        json_schema_extra = {
+            "example": {
+                "query": "Recommend defensive positions for night attack",
+                "use_mcts": True,
+                "use_rag": True,
+                "mcts_iterations": 200,
+                "thread_id": "session_123",
+            }
+        }
+class QueryResponse(BaseModel):
+    """Response model for query results."""
+    response: str = Field(..., description="Final synthesized response")
+    confidence: float = Field(..., ge=0.0, le=1.0, description="Overall confidence score")
+    agents_used: list[str] = Field(..., description="List of agents that contributed")
+    mcts_stats: dict[str, Any] | None = Field(default=None, description="MCTS simulation statistics")
+    processing_time_ms: float = Field(..., description="Total processing time in milliseconds")
+    metadata: dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
+class HealthResponse(BaseModel):
+    """Health check response."""
+    status: str = Field(..., description="Service status")
+    timestamp: str = Field(..., description="Current timestamp")
+    version: str = Field(default="1.0.0", description="API version")
+    uptime_seconds: float = Field(..., description="Service uptime")
+class ReadinessResponse(BaseModel):
+    """Readiness check response."""
+    ready: bool = Field(..., description="Whether service is ready")
+    checks: dict[str, bool] = Field(..., description="Individual check results")
+class ErrorResponse(BaseModel):
+    """Error response model."""
+    error: bool = Field(default=True)
+    error_code: str = Field(..., description="Machine-readable error code")
+    message: str = Field(..., description="Human-readable error message")
+    timestamp: str = Field(..., description="Error timestamp")
+# Application startup
+start_time = time.time()
+framework_instance = None
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Application lifespan manager."""
+    global framework_instance
+    # Startup
+    print("Starting MCTS Framework API server...")
+    # Initialize authenticator with demo key (replace in production)
+    authenticator = APIKeyAuthenticator(
+        valid_keys=["demo-api-key-replace-in-production"],
+        rate_limit_config=RateLimitConfig(
+            requests_per_minute=60,
+            requests_per_hour=1000,
+            requests_per_day=10000,
+        ),
+    )
+    set_authenticator(authenticator)
+    # Initialize framework (lazy loading)
+    # framework_instance = create_framework()
+    print("API server started successfully")
+    yield
+    # Shutdown
+    print("Shutting down API server...")
+# Create FastAPI app
+app = FastAPI(
+    title="LangGraph Multi-Agent MCTS API",
+    description="""
+## Multi-Agent Reasoning API with MCTS Tactical Simulation
+This API provides access to a sophisticated multi-agent reasoning framework that combines:
+- **HRM Agent**: Hierarchical decomposition of complex queries
+- **TRM Agent**: Iterative refinement for response quality
+- **MCTS Engine**: Monte Carlo Tree Search for tactical simulation
+- **RAG Integration**: Context retrieval from vector stores
+### Features
+- Secure API key authentication
+- Rate limiting per client
+- Real-time metrics (Prometheus)
+- Distributed tracing (OpenTelemetry)
+- Production-grade error handling
+### Quick Start
+1. Obtain an API key
+2. Include `X-API-Key` header in requests
+3. Send queries to `/query` endpoint
+4. Monitor health via `/health` endpoint
+    """,
+    version="1.0.0",
+    docs_url="/docs",
+    redoc_url="/redoc",
+    openapi_tags=[
+        {"name": "query", "description": "Query processing operations"},
+        {"name": "health", "description": "Health and readiness checks"},
+        {"name": "metrics", "description": "Observability endpoints"},
+    ],
+    lifespan=lifespan,
+)
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Configure appropriately for production
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Middleware for metrics
+@app.middleware("http")
+async def metrics_middleware(request: Request, call_next):
+    """Track request metrics."""
+    if PROMETHEUS_AVAILABLE:
+        ACTIVE_REQUESTS.inc()
+    start = time.perf_counter()
+    try:
+        response = await call_next(request)
+        status = response.status_code
+    except Exception:
+        status = 500
+        raise
+    finally:
+        if PROMETHEUS_AVAILABLE:
+            ACTIVE_REQUESTS.dec()
+            elapsed = time.perf_counter() - start
+            REQUEST_COUNT.labels(method=request.method, endpoint=request.url.path, status=str(status)).inc()
+            REQUEST_LATENCY.labels(method=request.method, endpoint=request.url.path).observe(elapsed)
+    return response
+# Authentication dependency
+async def verify_api_key(x_api_key: str = Header(..., description="API key for authentication")):
+    """Verify API key and return client info."""
+    if not IMPORTS_AVAILABLE:
+        raise HTTPException(status_code=500, detail="Authentication module not available")
+    try:
+        authenticator = get_authenticator()
+        client_info = authenticator.require_auth(x_api_key)
+        return client_info
+    except AuthenticationError as e:
+        if PROMETHEUS_AVAILABLE:
+            ERROR_COUNT.labels(error_type="authentication").inc()
+        raise HTTPException(status_code=401, detail=e.user_message)
+    except RateLimitError as e:
+        if PROMETHEUS_AVAILABLE:
+            ERROR_COUNT.labels(error_type="rate_limit").inc()
+        raise HTTPException(
+            status_code=429, detail=e.user_message, headers={"Retry-After": str(e.retry_after_seconds or 60)}
+        )
+# Exception handlers
+@app.exception_handler(FrameworkError)
+async def framework_error_handler(request: Request, exc: FrameworkError):
+    """Handle framework-specific errors."""
+    if PROMETHEUS_AVAILABLE:
+        ERROR_COUNT.labels(error_type=exc.error_code).inc()
+    return JSONResponse(status_code=500, content=exc.to_user_response())
+@app.exception_handler(ValidationError)
+async def validation_error_handler(request: Request, exc: ValidationError):
+    """Handle validation errors."""
+    if PROMETHEUS_AVAILABLE:
+        ERROR_COUNT.labels(error_type="validation").inc()
+    return JSONResponse(status_code=400, content=exc.to_user_response())
+# Endpoints
+@app.get("/health", response_model=HealthResponse, tags=["health"])
+async def health_check():
+    """
+    Health check endpoint.
+    Returns basic service health status. Use this for load balancer health checks.
+    """
+    return HealthResponse(
+        status="healthy",
+        timestamp=datetime.utcnow().isoformat(),
+        version="1.0.0",
+        uptime_seconds=time.time() - start_time,
+    )
+@app.get("/ready", response_model=ReadinessResponse, tags=["health"])
+async def readiness_check():
+    """
+    Readiness check endpoint.
+    Verifies all dependencies are available. Use this for Kubernetes readiness probes.
+    """
+    checks = {
+        "imports_available": IMPORTS_AVAILABLE,
+        "authenticator_configured": True,
+        "llm_client_available": True,  # Would check actual client
+        "prometheus_available": PROMETHEUS_AVAILABLE,
+    }
+    # Check if all critical services are available
+    all_ready = all(
+        [
+            checks["imports_available"],
+            checks["authenticator_configured"],
+        ]
+    )
+    if not all_ready:
+        raise HTTPException(status_code=503, detail="Service not ready")
+    return ReadinessResponse(ready=all_ready, checks=checks)
+@app.get("/metrics", tags=["metrics"])
+async def prometheus_metrics():
+    """
+    Prometheus metrics endpoint.
+    Returns metrics in Prometheus text format for scraping.
+    """
+    if not PROMETHEUS_AVAILABLE:
+        raise HTTPException(status_code=501, detail="Prometheus metrics not available")
+    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)
+@app.post(
+    "/query",
+    response_model=QueryResponse,
+    tags=["query"],
+    responses={
+        401: {"model": ErrorResponse, "description": "Authentication failed"},
+        429: {"model": ErrorResponse, "description": "Rate limit exceeded"},
+        400: {"model": ErrorResponse, "description": "Invalid input"},
+        500: {"model": ErrorResponse, "description": "Internal server error"},
+    },
+)
+async def process_query(request: QueryRequest, client_info: ClientInfo = Depends(verify_api_key)):
+    """
+    Process a query using the multi-agent MCTS framework.
+    This endpoint:
+    1. Validates the input query
+    2. Optionally retrieves context via RAG
+    3. Processes through HRM and TRM agents
+    4. Optionally runs MCTS simulation
+    5. Synthesizes a final response
+    **Authentication**: Requires valid API key in X-API-Key header.
+    **Rate Limiting**: Subject to rate limits per client.
+    """
+    start_time = time.perf_counter()
+    # Validate input using validation models
+    if IMPORTS_AVAILABLE:
+        try:
+            QueryInput(
+                query=request.query,
+                use_rag=request.use_rag,
+                use_mcts=request.use_mcts,
+                thread_id=request.thread_id,
+            )
+        except Exception as e:
+            if PROMETHEUS_AVAILABLE:
+                ERROR_COUNT.labels(error_type="validation").inc()
+            raise HTTPException(status_code=400, detail=f"Validation failed: {str(e)}")
+    # Process query (mock implementation for demo)
+    # In production, this would call the actual framework
+    await asyncio.sleep(0.1)  # Simulate processing
+    processing_time = (time.perf_counter() - start_time) * 1000
+    # Mock response
+    return QueryResponse(
+        response=f"Processed query: {request.query[:100]}...",
+        confidence=0.85,
+        agents_used=["hrm", "trm"] + (["mcts"] if request.use_mcts else []),
+        mcts_stats=(
+            {
+                "iterations": request.mcts_iterations or 100,
+                "best_action": "recommended_action",
+                "root_visits": request.mcts_iterations or 100,
+            }
+            if request.use_mcts
+            else None
+        ),
+        processing_time_ms=processing_time,
+        metadata={
+            "client_id": client_info.client_id,
+            "thread_id": request.thread_id,
+            "rag_enabled": request.use_rag,
+        },
+    )
+@app.get("/stats", tags=["metrics"])
+async def get_stats(client_info: ClientInfo = Depends(verify_api_key)):
+    """
+    Get usage statistics for the authenticated client.
+    Returns request counts and rate limit information.
+    """
+    authenticator = get_authenticator()
+    stats = authenticator.get_client_stats(client_info.client_id)
+    return {
+        "client_id": client_info.client_id,
+        "roles": list(client_info.roles),
+        **stats,
+        "rate_limits": {
+            "per_minute": authenticator.rate_limit_config.requests_per_minute,
+            "per_hour": authenticator.rate_limit_config.requests_per_hour,
+            "per_day": authenticator.rate_limit_config.requests_per_day,
+        },
+    }
+# Entry point
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(
+        "src.api.rest_server:app",
+        host="0.0.0.0",
+        port=8000,
+        reload=False,
+        workers=4,
+        log_level="info",
+        access_log=True,
+    )

src/config/__init__.py ADDED Viewed

File without changes

src/config/meta_controller.yaml ADDED Viewed

	@@ -0,0 +1,22 @@

+meta_controller:
+  enabled: false  # Disabled by default for backward compatibility
+  type: "rnn"  # "rnn" or "bert"
+  fallback_to_rule_based: true  # Fallback on errors
+  rnn:
+    hidden_dim: 64
+    num_layers: 1
+    dropout: 0.1
+    model_path: null  # Path to trained model (null for untrained)
+  bert:
+    model_name: "prajjwal1/bert-mini"
+    use_lora: true
+    lora_r: 4
+    lora_alpha: 16
+    lora_dropout: 0.1
+    model_path: null  # Path to trained LoRA adapter
+  inference:
+    device: null  # Auto-detect if null
+    seed: 42

src/config/settings.py ADDED Viewed

	@@ -0,0 +1,431 @@

+"""
+Pydantic Settings v2 configuration management for LangGraph Multi-Agent MCTS.
+Provides:
+- Secure configuration loading from environment variables and .env files
+- Type-safe settings with validation
+- Secrets protection using SecretStr
+- MCTS parameter bounds validation
+- Support for multiple LLM providers
+"""
+from enum import Enum
+from pydantic import (
+    Field,
+    SecretStr,
+    field_validator,
+    model_validator,
+)
+from pydantic_settings import BaseSettings, SettingsConfigDict
+class LLMProvider(str, Enum):
+    """Supported LLM providers."""
+    OPENAI = "openai"
+    ANTHROPIC = "anthropic"
+    LMSTUDIO = "lmstudio"
+class LogLevel(str, Enum):
+    """Supported log levels."""
+    DEBUG = "DEBUG"
+    INFO = "INFO"
+    WARNING = "WARNING"
+    ERROR = "ERROR"
+    CRITICAL = "CRITICAL"
+class MCTSImplementation(str, Enum):
+    """MCTS implementation variants."""
+    BASELINE = "baseline"  # Original MCTS core
+    NEURAL = "neural"  # Neural-guided AlphaZero-style MCTS
+class Settings(BaseSettings):
+    """
+    Application settings with security-first configuration.
+    All sensitive values use SecretStr to prevent accidental exposure in logs.
+    Configuration is loaded from environment variables with .env file support.
+    """
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=True,
+        extra="ignore",
+        validate_default=True,
+    )
+    # LLM Provider Configuration
+    LLM_PROVIDER: LLMProvider = Field(
+        default=LLMProvider.OPENAI, description="LLM provider to use (openai, anthropic, lmstudio)"
+    )
+    # API Keys (Secrets)
+    OPENAI_API_KEY: SecretStr | None = Field(
+        default=None, description="OpenAI API key (required if using OpenAI provider)"
+    )
+    ANTHROPIC_API_KEY: SecretStr | None = Field(
+        default=None, description="Anthropic API key (required if using Anthropic provider)"
+    )
+    BRAINTRUST_API_KEY: SecretStr | None = Field(
+        default=None, description="Braintrust API key for experiment tracking (optional)"
+    )
+    PINECONE_API_KEY: SecretStr | None = Field(
+        default=None, description="Pinecone API key for vector storage (optional)"
+    )
+    PINECONE_HOST: str | None = Field(
+        default=None, description="Pinecone host URL (e.g., https://index.svc.environment.pinecone.io)"
+    )
+    # Local LLM Configuration
+    LMSTUDIO_BASE_URL: str | None = Field(
+        default="http://localhost:1234/v1", description="LM Studio API base URL for local inference"
+    )
+    LMSTUDIO_MODEL: str | None = Field(default=None, description="LM Studio model identifier (e.g., liquid/lfm2-1.2b)")
+    # MCTS Configuration with bounds validation
+    MCTS_ENABLED: bool = Field(default=True, description="Enable MCTS for agent decision-making")
+    MCTS_IMPL: MCTSImplementation = Field(
+        default=MCTSImplementation.BASELINE, description="MCTS implementation variant to use"
+    )
+    MCTS_ITERATIONS: int = Field(default=100, ge=1, le=10000, description="Number of MCTS iterations (1-10000)")
+    MCTS_C: float = Field(
+        default=1.414, ge=0.0, le=10.0, description="MCTS exploration weight (UCB1 constant, 0.0-10.0)"
+    )
+    # Random seed for reproducibility
+    SEED: int | None = Field(default=None, ge=0, description="Random seed for reproducibility (optional)")
+    # LangSmith Configuration for tracing and evaluation
+    LANGSMITH_API_KEY: SecretStr | None = Field(
+        default=None, description="LangSmith API key for tracing and evaluation (optional)"
+    )
+    LANGSMITH_PROJECT: str = Field(default="langgraph-mcts", description="LangSmith project name")
+    LANGCHAIN_TRACING_V2: bool = Field(default=False, description="Enable LangChain tracing v2")
+    LANGCHAIN_ENDPOINT: str = Field(default="https://api.smith.langchain.com", description="LangChain API endpoint")
+    # Weights & Biases Configuration for experiment tracking
+    WANDB_API_KEY: SecretStr | None = Field(
+        default=None, description="Weights & Biases API key for experiment tracking (optional)"
+    )
+    WANDB_PROJECT: str = Field(default="langgraph-mcts", description="W&B project name")
+    WANDB_ENTITY: str | None = Field(default=None, description="W&B entity (username or team name)")
+    WANDB_MODE: str = Field(default="online", description="W&B mode: online, offline, or disabled")
+    # Logging Configuration
+    LOG_LEVEL: LogLevel = Field(default=LogLevel.INFO, description="Application log level")
+    # OpenTelemetry Configuration
+    OTEL_EXPORTER_OTLP_ENDPOINT: str | None = Field(
+        default=None, description="OpenTelemetry OTLP exporter endpoint URL"
+    )
+    # S3 Storage Configuration
+    S3_BUCKET: str | None = Field(default=None, description="S3 bucket name for artifact storage")
+    S3_PREFIX: str = Field(default="mcts-artifacts", description="S3 key prefix for stored artifacts")
+    S3_REGION: str = Field(default="us-east-1", description="AWS region for S3 bucket")
+    # Network Configuration (security)
+    HTTP_TIMEOUT_SECONDS: int = Field(default=30, ge=1, le=300, description="HTTP request timeout in seconds")
+    HTTP_MAX_RETRIES: int = Field(default=3, ge=0, le=10, description="Maximum HTTP request retries")
+    # Security Settings
+    MAX_QUERY_LENGTH: int = Field(
+        default=10000, ge=1, le=100000, description="Maximum allowed query length in characters"
+    )
+    RATE_LIMIT_REQUESTS_PER_MINUTE: int = Field(
+        default=60, ge=1, le=1000, description="Rate limit for API requests per minute"
+    )
+    @field_validator("OPENAI_API_KEY")
+    @classmethod
+    def validate_openai_key_format(cls, v: SecretStr | None) -> SecretStr | None:
+        """Validate OpenAI API key format without exposing the value."""
+        if v is not None:
+            secret_value = v.get_secret_value()
+            # Check for obviously invalid patterns
+            if secret_value in ("", "your-api-key-here", "sk-xxx", "REPLACE_ME"):
+                raise ValueError("OpenAI API key appears to be a placeholder value")
+            if not secret_value.startswith("sk-"):
+                raise ValueError("OpenAI API key should start with 'sk-'")
+            if len(secret_value) < 20:
+                raise ValueError("OpenAI API key appears to be too short")
+        return v
+    @field_validator("ANTHROPIC_API_KEY")
+    @classmethod
+    def validate_anthropic_key_format(cls, v: SecretStr | None) -> SecretStr | None:
+        """Validate Anthropic API key format without exposing the value."""
+        if v is not None:
+            secret_value = v.get_secret_value()
+            # Check for obviously invalid patterns
+            if secret_value in ("", "your-api-key-here", "REPLACE_ME"):
+                raise ValueError("Anthropic API key appears to be a placeholder value")
+            if len(secret_value) < 20:
+                raise ValueError("Anthropic API key appears to be too short")
+        return v
+    @field_validator("BRAINTRUST_API_KEY")
+    @classmethod
+    def validate_braintrust_key_format(cls, v: SecretStr | None) -> SecretStr | None:
+        """Validate Braintrust API key format without exposing the value."""
+        if v is not None:
+            secret_value = v.get_secret_value()
+            # Check for obviously invalid patterns
+            if secret_value in ("", "your-api-key-here", "REPLACE_ME"):
+                raise ValueError("Braintrust API key appears to be a placeholder value")
+            if len(secret_value) < 20:
+                raise ValueError("Braintrust API key appears to be too short")
+        return v
+    @field_validator("PINECONE_API_KEY")
+    @classmethod
+    def validate_pinecone_key_format(cls, v: SecretStr | None) -> SecretStr | None:
+        """Validate Pinecone API key format without exposing the value."""
+        if v is not None:
+            secret_value = v.get_secret_value()
+            # Check for obviously invalid patterns
+            if secret_value in ("", "your-api-key-here", "REPLACE_ME"):
+                raise ValueError("Pinecone API key appears to be a placeholder value")
+            if len(secret_value) < 20:
+                raise ValueError("Pinecone API key appears to be too short")
+        return v
+    @field_validator("LANGSMITH_API_KEY")
+    @classmethod
+    def validate_langsmith_key_format(cls, v: SecretStr | None) -> SecretStr | None:
+        """Validate LangSmith API key format without exposing the value."""
+        if v is not None:
+            secret_value = v.get_secret_value()
+            if secret_value in ("", "your-api-key-here", "REPLACE_ME"):
+                raise ValueError("LangSmith API key appears to be a placeholder value")
+            if len(secret_value) < 20:
+                raise ValueError("LangSmith API key appears to be too short")
+        return v
+    @field_validator("WANDB_API_KEY")
+    @classmethod
+    def validate_wandb_key_format(cls, v: SecretStr | None) -> SecretStr | None:
+        """Validate Weights & Biases API key format without exposing the value."""
+        if v is not None:
+            secret_value = v.get_secret_value()
+            if secret_value in ("", "your-api-key-here", "REPLACE_ME"):
+                raise ValueError("W&B API key appears to be a placeholder value")
+            if len(secret_value) < 20:
+                raise ValueError("W&B API key appears to be too short")
+        return v
+    @field_validator("PINECONE_HOST")
+    @classmethod
+    def validate_pinecone_host(cls, v: str | None) -> str | None:
+        """Validate Pinecone host URL format."""
+        if v is not None and v != "":
+            if not v.startswith("https://"):
+                raise ValueError("Pinecone host must start with https://")
+            if "pinecone.io" not in v:
+                raise ValueError("Pinecone host should be a valid pinecone.io URL")
+        return v
+    @field_validator("LMSTUDIO_BASE_URL")
+    @classmethod
+    def validate_lmstudio_url(cls, v: str | None) -> str | None:
+        """Validate LM Studio base URL format."""
+        if v is not None:
+            if not v.startswith(("http://", "https://")):
+                raise ValueError("LM Studio base URL must start with http:// or https://")
+            # Warn if not localhost (potential security concern)
+            if not any(host in v for host in ("localhost", "127.0.0.1", "::1")):
+                import warnings
+                warnings.warn(
+                    "LM Studio URL points to non-localhost address. Ensure this is intentional and secure.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+        return v
+    @field_validator("OTEL_EXPORTER_OTLP_ENDPOINT")
+    @classmethod
+    def validate_otel_endpoint(cls, v: str | None) -> str | None:
+        """Validate OpenTelemetry endpoint URL."""
+        if v is not None and v != "" and not v.startswith(("http://", "https://", "grpc://")):
+            raise ValueError("OpenTelemetry endpoint must start with http://, https://, or grpc://")
+        return v
+    @field_validator("S3_BUCKET")
+    @classmethod
+    def validate_s3_bucket_name(cls, v: str | None) -> str | None:
+        """Validate S3 bucket name format."""
+        if v is not None:
+            # S3 bucket naming rules
+            if len(v) < 3 or len(v) > 63:
+                raise ValueError("S3 bucket name must be 3-63 characters long")
+            if not v.replace("-", "").replace(".", "").isalnum():
+                raise ValueError("S3 bucket name can only contain lowercase letters, numbers, hyphens, and periods")
+            if v.startswith("-") or v.endswith("-"):
+                raise ValueError("S3 bucket name cannot start or end with a hyphen")
+        return v
+    @model_validator(mode="after")
+    def validate_provider_credentials(self) -> "Settings":
+        """Ensure required API keys are provided for the selected provider."""
+        if self.LLM_PROVIDER == LLMProvider.OPENAI:
+            if self.OPENAI_API_KEY is None:
+                raise ValueError(
+                    "OPENAI_API_KEY is required when using OpenAI provider. "
+                    "Set the OPENAI_API_KEY environment variable."
+                )
+        elif self.LLM_PROVIDER == LLMProvider.ANTHROPIC:
+            if self.ANTHROPIC_API_KEY is None:
+                raise ValueError(
+                    "ANTHROPIC_API_KEY is required when using Anthropic provider. "
+                    "Set the ANTHROPIC_API_KEY environment variable."
+                )
+        elif self.LLM_PROVIDER == LLMProvider.LMSTUDIO and self.LMSTUDIO_BASE_URL is None:
+            raise ValueError("LMSTUDIO_BASE_URL is required when using LM Studio provider.")
+        return self
+    def get_api_key(self) -> str | None:
+        """
+        Get the API key for the current provider.
+        Returns the secret value - use with caution to avoid logging.
+        """
+        if self.LLM_PROVIDER == LLMProvider.OPENAI and self.OPENAI_API_KEY:
+            return self.OPENAI_API_KEY.get_secret_value()
+        elif self.LLM_PROVIDER == LLMProvider.ANTHROPIC and self.ANTHROPIC_API_KEY:
+            return self.ANTHROPIC_API_KEY.get_secret_value()
+        return None
+    def safe_dict(self) -> dict:
+        """
+        Return settings as dictionary with secrets masked.
+        Safe for logging and display purposes.
+        """
+        data = self.model_dump()
+        # Mask all sensitive fields
+        secret_fields = [
+            "OPENAI_API_KEY",
+            "ANTHROPIC_API_KEY",
+            "BRAINTRUST_API_KEY",
+            "PINECONE_API_KEY",
+            "LANGSMITH_API_KEY",
+            "WANDB_API_KEY",
+        ]
+        for field in secret_fields:
+            if field in data and data[field]:
+                data[field] = "***MASKED***"
+        return data
+    def get_braintrust_api_key(self) -> str | None:
+        """
+        Get the Braintrust API key if configured.
+        Returns the secret value - use with caution to avoid logging.
+        """
+        if self.BRAINTRUST_API_KEY:
+            return self.BRAINTRUST_API_KEY.get_secret_value()
+        return None
+    def get_pinecone_api_key(self) -> str | None:
+        """
+        Get the Pinecone API key if configured.
+        Returns the secret value - use with caution to avoid logging.
+        """
+        if self.PINECONE_API_KEY:
+            return self.PINECONE_API_KEY.get_secret_value()
+        return None
+    def get_langsmith_api_key(self) -> str | None:
+        """
+        Get the LangSmith API key if configured.
+        Returns the secret value - use with caution to avoid logging.
+        """
+        if self.LANGSMITH_API_KEY:
+            return self.LANGSMITH_API_KEY.get_secret_value()
+        return None
+    def get_wandb_api_key(self) -> str | None:
+        """
+        Get the Weights & Biases API key if configured.
+        Returns the secret value - use with caution to avoid logging.
+        """
+        if self.WANDB_API_KEY:
+            return self.WANDB_API_KEY.get_secret_value()
+        return None
+    def __repr__(self) -> str:
+        """Safe string representation that doesn't expose secrets."""
+        return f"Settings(LLM_PROVIDER={self.LLM_PROVIDER}, MCTS_ENABLED={self.MCTS_ENABLED}, MCTS_IMPL={self.MCTS_IMPL}, LOG_LEVEL={self.LOG_LEVEL})"
+# Global settings instance (lazily loaded)
+_settings: Settings | None = None
+def get_settings() -> Settings:
+    """
+    Get the global settings instance.
+    Settings are loaded once and cached. To reload, call reset_settings() first.
+    Returns:
+        Settings: Application configuration instance
+    Raises:
+        ValidationError: If configuration is invalid
+    """
+    global _settings
+    if _settings is None:
+        _settings = Settings()
+    return _settings
+def reset_settings() -> None:
+    """
+    Reset the global settings instance.
+    Forces settings to be reloaded from environment on next get_settings() call.
+    Useful for testing.
+    """
+    global _settings
+    _settings = None
+# Type exports for external use
+__all__ = [
+    "Settings",
+    "LLMProvider",
+    "LogLevel",
+    "MCTSImplementation",
+    "get_settings",
+    "reset_settings",
+]

src/data/__init__.py ADDED Viewed

	@@ -0,0 +1,29 @@

+"""
+Dataset Integration Module for Multi-Agent MCTS Training.
+This module provides utilities for loading, preprocessing, and managing
+open-source datasets for training HRM/TRM agents and neural meta-controllers.
+Supported Datasets:
+- DABStep: Multi-step reasoning tasks (CC-BY-4.0)
+- PRIMUS-Seed: Cybersecurity domain knowledge (ODC-BY)
+- PRIMUS-Instruct: Instruction fine-tuning data (ODC-BY)
+"""
+from .dataset_loader import DABStepLoader, DatasetLoader, PRIMUSLoader
+from .preprocessing import TextPreprocessor, TokenizerWrapper
+from .tactical_augmentation import TacticalAugmenter
+from .train_test_split import DataSplitter, StratifiedSplitter
+__all__ = [
+    "DatasetLoader",
+    "DABStepLoader",
+    "PRIMUSLoader",
+    "TextPreprocessor",
+    "TokenizerWrapper",
+    "TacticalAugmenter",
+    "DataSplitter",
+    "StratifiedSplitter",
+]
+__version__ = "1.0.0"

src/data/dataset_loader.py ADDED Viewed

	@@ -0,0 +1,551 @@

+"""
+Dataset Loading Module for Open-Source Training Data.
+Provides unified loading interfaces for:
+- DABStep: Multi-step data analysis reasoning
+- PRIMUS: Cybersecurity domain knowledge
+- Custom tactical datasets
+License Attribution:
+- DABStep: CC-BY-4.0 (Creative Commons Attribution)
+- PRIMUS: ODC-BY (Open Data Commons Attribution)
+"""
+import logging
+from abc import ABC, abstractmethod
+from collections.abc import Iterator
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+logger = logging.getLogger(__name__)
+@dataclass
+class DatasetSample:
+    """Standardized representation of a dataset sample."""
+    id: str
+    text: str
+    metadata: dict[str, Any] = field(default_factory=dict)
+    labels: list[str] | None = None
+    difficulty: str | None = None
+    domain: str | None = None
+    reasoning_steps: list[str] | None = None
+@dataclass
+class DatasetStatistics:
+    """Statistics about a loaded dataset."""
+    total_samples: int
+    domains: dict[str, int]
+    avg_text_length: float
+    difficulty_distribution: dict[str, int]
+    total_tokens: int = 0
+class DatasetLoader(ABC):
+    """Abstract base class for dataset loaders."""
+    def __init__(self, cache_dir: str | None = None):
+        """
+        Initialize dataset loader.
+        Args:
+            cache_dir: Directory to cache downloaded datasets
+        """
+        self.cache_dir = cache_dir or str(Path.home() / ".cache" / "mcts_datasets")
+        self._dataset = None
+        self._statistics = None
+    @abstractmethod
+    def load(self, split: str = "train") -> list[DatasetSample]:
+        """Load dataset split."""
+        pass
+    @abstractmethod
+    def get_statistics(self) -> DatasetStatistics:
+        """Get dataset statistics."""
+        pass
+    @abstractmethod
+    def iterate_samples(self, batch_size: int = 32) -> Iterator[list[DatasetSample]]:
+        """Iterate over samples in batches."""
+        pass
+class DABStepLoader(DatasetLoader):
+    """
+    Loader for DABStep Multi-Step Reasoning Dataset.
+    DABStep contains 450+ data analysis tasks requiring sequential,
+    iterative problem-solving. Perfect for training HRM/TRM agents.
+    License: CC-BY-4.0 (Attribution required)
+    Source: huggingface.co/datasets/adyen/DABstep
+    """
+    DATASET_NAME = "adyen/DABstep"
+    DIFFICULTIES = ["easy", "medium", "hard"]
+    def __init__(self, cache_dir: str | None = None):
+        """Initialize DABStep loader."""
+        super().__init__(cache_dir)
+        self._loaded_samples: list[DatasetSample] = []
+    def load(self, split: str = "train", difficulty: str | None = None) -> list[DatasetSample]:
+        """
+        Load DABStep dataset.
+        Args:
+            split: Dataset split ('train', 'validation', 'test')
+            difficulty: Filter by difficulty ('easy', 'medium', 'hard')
+        Returns:
+            List of DatasetSample objects
+        """
+        try:
+            from datasets import load_dataset
+            logger.info(f"Loading DABStep dataset (split={split})")
+            dataset = load_dataset(
+                self.DATASET_NAME,
+                cache_dir=self.cache_dir,
+            )
+            if split not in dataset:
+                available_splits = list(dataset.keys())
+                logger.warning(f"Split '{split}' not found. Available: {available_splits}")
+                split = available_splits[0] if available_splits else "train"
+            samples = []
+            for idx, item in enumerate(dataset[split]):
+                sample = DatasetSample(
+                    id=f"dabstep_{split}_{idx}",
+                    text=str(item.get("question", item.get("text", ""))),
+                    metadata={
+                        "source": "DABStep",
+                        "license": "CC-BY-4.0",
+                        "split": split,
+                        "original_data": item,
+                    },
+                    difficulty=item.get("difficulty", "medium"),
+                    domain="data_analysis",
+                    reasoning_steps=item.get("steps", []),
+                )
+                if difficulty and sample.difficulty != difficulty:
+                    continue
+                samples.append(sample)
+            self._loaded_samples = samples
+            logger.info(f"Loaded {len(samples)} DABStep samples")
+            return samples
+        except ImportError:
+            logger.error("datasets library not installed. Run: pip install datasets")
+            raise
+        except Exception as e:
+            logger.error(f"Failed to load DABStep: {e}")
+            raise
+    def get_statistics(self) -> DatasetStatistics:
+        """Get statistics about loaded DABStep data."""
+        if not self._loaded_samples:
+            raise ValueError("No samples loaded. Call load() first.")
+        difficulty_dist = {}
+        total_length = 0
+        for sample in self._loaded_samples:
+            diff = sample.difficulty or "unknown"
+            difficulty_dist[diff] = difficulty_dist.get(diff, 0) + 1
+            total_length += len(sample.text)
+        return DatasetStatistics(
+            total_samples=len(self._loaded_samples),
+            domains={"data_analysis": len(self._loaded_samples)},
+            avg_text_length=total_length / len(self._loaded_samples),
+            difficulty_distribution=difficulty_dist,
+        )
+    def iterate_samples(self, batch_size: int = 32) -> Iterator[list[DatasetSample]]:
+        """Iterate over samples in batches."""
+        if not self._loaded_samples:
+            raise ValueError("No samples loaded. Call load() first.")
+        for i in range(0, len(self._loaded_samples), batch_size):
+            yield self._loaded_samples[i : i + batch_size]
+    def get_reasoning_tasks(self) -> list[DatasetSample]:
+        """Get only samples with explicit reasoning steps."""
+        return [s for s in self._loaded_samples if s.reasoning_steps]
+class PRIMUSLoader(DatasetLoader):
+    """
+    Loader for PRIMUS Cybersecurity Dataset Suite.
+    PRIMUS contains:
+    - Seed: 674,848 cybersecurity documents (190M tokens)
+    - Instruct: 835 instruction-tuning samples
+    - Reasoning: Self-reflection data for reasoning
+    License: ODC-BY (Open Data Commons Attribution)
+    Source: huggingface.co/datasets/trendmicro-ailab/Primus-Seed
+    """
+    SEED_DATASET = "trendmicro-ailab/Primus-Seed"
+    INSTRUCT_DATASET = "trendmicro-ailab/Primus-Instruct"
+    DOMAINS = [
+        "mitre_attack",
+        "wikipedia",
+        "company_sites",
+        "threat_intelligence",
+        "vulnerability_db",
+    ]
+    def __init__(self, cache_dir: str | None = None):
+        """Initialize PRIMUS loader."""
+        super().__init__(cache_dir)
+        self._seed_samples: list[DatasetSample] = []
+        self._instruct_samples: list[DatasetSample] = []
+    def load(
+        self,
+        split: str = "train",
+        dataset_type: str = "seed",
+        domains: list[str] | None = None,
+        max_samples: int | None = None,
+        streaming: bool = True,
+    ) -> list[DatasetSample]:
+        """
+        Load PRIMUS dataset.
+        Args:
+            split: Dataset split ('train', 'validation', 'test')
+            dataset_type: 'seed' for knowledge base, 'instruct' for fine-tuning
+            domains: Filter by specific domains
+            max_samples: Limit number of samples (useful for large datasets)
+            streaming: Use streaming mode for large datasets (default True)
+        Returns:
+            List of DatasetSample objects
+        """
+        try:
+            from datasets import load_dataset
+            dataset_name = self.SEED_DATASET if dataset_type == "seed" else self.INSTRUCT_DATASET
+            logger.info(f"Loading PRIMUS {dataset_type} dataset")
+            # Use streaming for large seed dataset to avoid download issues
+            use_streaming = streaming and dataset_type == "seed" and max_samples is not None
+            if use_streaming:
+                logger.info(f"Using streaming mode (max_samples={max_samples})")
+                dataset = load_dataset(
+                    dataset_name,
+                    "default",
+                    streaming=True,
+                    cache_dir=self.cache_dir,
+                )
+                # For streaming, iterate the first available split
+                data_iter = iter(dataset["train"]) if "train" in dataset else iter(dataset[list(dataset.keys())[0]])
+            else:
+                dataset = load_dataset(
+                    dataset_name,
+                    cache_dir=self.cache_dir,
+                )
+                if split not in dataset:
+                    available_splits = list(dataset.keys())
+                    logger.warning(f"Split '{split}' not found. Using: {available_splits[0]}")
+                    split = available_splits[0]
+                data_iter = iter(dataset[split])
+            samples = []
+            count = 0
+            for idx, item in enumerate(data_iter):
+                if max_samples and count >= max_samples:
+                    break
+                domain = item.get("domain", item.get("source", "unknown"))
+                if domains and domain not in domains:
+                    continue
+                if dataset_type == "instruct":
+                    text = f"Instruction: {item.get('instruction', '')}\nResponse: {item.get('response', '')}"
+                else:
+                    text = str(item.get("text", item.get("content", "")))
+                sample = DatasetSample(
+                    id=f"primus_{dataset_type}_{split}_{idx}",
+                    text=text,
+                    metadata={
+                        "source": f"PRIMUS-{dataset_type.capitalize()}",
+                        "license": "ODC-BY",
+                        "split": split,
+                        "original_domain": domain,
+                    },
+                    domain=domain,
+                    labels=item.get("labels", item.get("tags", [])),
+                )
+                samples.append(sample)
+                count += 1
+            if dataset_type == "seed":
+                self._seed_samples = samples
+            else:
+                self._instruct_samples = samples
+            logger.info(f"Loaded {len(samples)} PRIMUS {dataset_type} samples")
+            return samples
+        except ImportError:
+            logger.error("datasets library not installed. Run: pip install datasets")
+            raise
+        except Exception as e:
+            if "gated dataset" in str(e):
+                logger.error(
+                    f"PRIMUS is a gated dataset. Please authenticate with HuggingFace:\n"
+                    f"1. Create account at https://huggingface.co/\n"
+                    f"2. Accept dataset terms at https://huggingface.co/datasets/{dataset_name}\n"
+                    f"3. Create token at https://huggingface.co/settings/tokens\n"
+                    f"4. Run: huggingface-cli login"
+                )
+            else:
+                logger.error(f"Failed to load PRIMUS: {e}")
+            raise
+    def load_seed(self, max_samples: int | None = None) -> list[DatasetSample]:
+        """Load PRIMUS-Seed knowledge base."""
+        return self.load(dataset_type="seed", max_samples=max_samples)
+    def load_instruct(self) -> list[DatasetSample]:
+        """Load PRIMUS-Instruct fine-tuning data."""
+        return self.load(dataset_type="instruct", streaming=False)
+    def get_statistics(self) -> DatasetStatistics:
+        """Get statistics about loaded PRIMUS data."""
+        all_samples = self._seed_samples + self._instruct_samples
+        if not all_samples:
+            raise ValueError("No samples loaded. Call load() first.")
+        domain_dist = {}
+        total_length = 0
+        for sample in all_samples:
+            domain = sample.domain or "unknown"
+            domain_dist[domain] = domain_dist.get(domain, 0) + 1
+            total_length += len(sample.text)
+        return DatasetStatistics(
+            total_samples=len(all_samples),
+            domains=domain_dist,
+            avg_text_length=total_length / len(all_samples),
+            difficulty_distribution={"cybersecurity": len(all_samples)},
+        )
+    def iterate_samples(self, batch_size: int = 32) -> Iterator[list[DatasetSample]]:
+        """Iterate over all loaded samples in batches."""
+        all_samples = self._seed_samples + self._instruct_samples
+        if not all_samples:
+            raise ValueError("No samples loaded. Call load() first.")
+        for i in range(0, len(all_samples), batch_size):
+            yield all_samples[i : i + batch_size]
+    def get_mitre_attack_samples(self) -> list[DatasetSample]:
+        """Get samples specifically from MITRE ATT&CK."""
+        return [s for s in self._seed_samples if "mitre" in (s.domain or "").lower()]
+    def get_threat_intelligence_samples(self) -> list[DatasetSample]:
+        """Get threat intelligence related samples."""
+        return [
+            s
+            for s in self._seed_samples
+            if any(kw in (s.domain or "").lower() for kw in ["threat", "cti", "intelligence"])
+        ]
+class CombinedDatasetLoader:
+    """
+    Unified loader for combining multiple datasets.
+    Provides a single interface for loading and managing:
+    - DABStep (multi-step reasoning)
+    - PRIMUS (cybersecurity knowledge)
+    - Custom tactical datasets
+    """
+    def __init__(self, cache_dir: str | None = None):
+        """Initialize combined loader."""
+        self.cache_dir = cache_dir
+        self.dabstep_loader = DABStepLoader(cache_dir)
+        self.primus_loader = PRIMUSLoader(cache_dir)
+        self._all_samples: list[DatasetSample] = []
+    def load_all(
+        self,
+        dabstep_split: str = "train",
+        primus_max_samples: int | None = 10000,
+        include_instruct: bool = True,
+    ) -> list[DatasetSample]:
+        """
+        Load all datasets.
+        Args:
+            dabstep_split: Split for DABStep
+            primus_max_samples: Max samples from PRIMUS-Seed (None for all)
+            include_instruct: Whether to include PRIMUS-Instruct
+        Returns:
+            Combined list of all samples
+        """
+        logger.info("Loading combined datasets")
+        # Load DABStep
+        dabstep_samples = self.dabstep_loader.load(split=dabstep_split)
+        logger.info(f"DABStep: {len(dabstep_samples)} samples")
+        # Load PRIMUS-Seed
+        primus_seed = self.primus_loader.load_seed(max_samples=primus_max_samples)
+        logger.info(f"PRIMUS-Seed: {len(primus_seed)} samples")
+        # Load PRIMUS-Instruct
+        primus_instruct = []
+        if include_instruct:
+            primus_instruct = self.primus_loader.load_instruct()
+            logger.info(f"PRIMUS-Instruct: {len(primus_instruct)} samples")
+        self._all_samples = dabstep_samples + primus_seed + primus_instruct
+        logger.info(f"Total combined samples: {len(self._all_samples)}")
+        return self._all_samples
+    def get_domain_distribution(self) -> dict[str, int]:
+        """Get distribution of samples across domains."""
+        dist = {}
+        for sample in self._all_samples:
+            domain = sample.domain or "unknown"
+            dist[domain] = dist.get(domain, 0) + 1
+        return dist
+    def filter_by_domain(self, domain: str) -> list[DatasetSample]:
+        """Filter samples by domain."""
+        return [s for s in self._all_samples if s.domain == domain]
+    def get_multi_step_reasoning_samples(self) -> list[DatasetSample]:
+        """Get samples suitable for multi-step reasoning training."""
+        return [
+            s
+            for s in self._all_samples
+            if s.reasoning_steps or s.domain == "data_analysis" or "instruct" in s.metadata.get("source", "").lower()
+        ]
+    def export_for_training(self, output_path: str, format: str = "jsonl") -> str:
+        """
+        Export dataset for training.
+        Args:
+            output_path: Path to save exported data
+            format: Export format ('jsonl', 'csv', 'parquet')
+        Returns:
+            Path to exported file
+        """
+        import json
+        output_file = Path(output_path)
+        output_file.parent.mkdir(parents=True, exist_ok=True)
+        if format == "jsonl":
+            with open(output_file, "w", encoding="utf-8") as f:
+                for sample in self._all_samples:
+                    record = {
+                        "id": sample.id,
+                        "text": sample.text,
+                        "domain": sample.domain,
+                        "difficulty": sample.difficulty,
+                        "labels": sample.labels,
+                        "metadata": sample.metadata,
+                    }
+                    f.write(json.dumps(record) + "\n")
+        else:
+            raise NotImplementedError(f"Format {format} not yet supported")
+        logger.info(f"Exported {len(self._all_samples)} samples to {output_file}")
+        return str(output_file)
+def load_dataset(
+    dataset_name: str,
+    split: str = "train",
+    cache_dir: str | None = None,
+    **kwargs,
+) -> Any:
+    """
+    Unified interface for loading datasets from HuggingFace.
+    This function provides compatibility with the standard HuggingFace datasets API.
+    It wraps the underlying load_dataset function from the datasets library.
+    Args:
+        dataset_name: HuggingFace dataset identifier (e.g., "adyen/DABstep")
+        split: Dataset split to load ("train", "validation", "test")
+        cache_dir: Optional directory for caching downloaded datasets
+        **kwargs: Additional arguments passed to datasets.load_dataset
+    Returns:
+        HuggingFace Dataset object or dict of Dataset objects
+    Raises:
+        ImportError: If datasets library is not installed
+        Exception: If dataset loading fails
+    Examples:
+        >>> # Load DABStep dataset
+        >>> dataset = load_dataset("adyen/DABstep")
+        >>> samples = dataset["train"]
+        >>> # Load PRIMUS-Seed with custom cache
+        >>> dataset = load_dataset("trendmicro-ailab/Primus-Seed", cache_dir="/tmp/cache")
+    License Attribution:
+        - DABStep: CC-BY-4.0 (Creative Commons Attribution 4.0)
+        - PRIMUS: ODC-BY (Open Data Commons Attribution)
+    """
+    try:
+        from datasets import load_dataset as hf_load_dataset
+        logger.info(f"Loading dataset: {dataset_name} (split={split})")
+        load_kwargs = {
+            **kwargs,
+        }
+        if cache_dir:
+            load_kwargs["cache_dir"] = cache_dir
+        dataset = hf_load_dataset(dataset_name, **load_kwargs)
+        logger.info(f"Successfully loaded dataset: {dataset_name}")
+        return dataset
+    except ImportError:
+        logger.error("datasets library not installed. Run: pip install datasets")
+        raise ImportError("The datasets library is required but not installed. Install it with: pip install datasets")
+    except Exception as e:
+        logger.error(f"Failed to load dataset {dataset_name}: {e}")
+        raise

src/data/preprocessing.py ADDED Viewed

	@@ -0,0 +1,406 @@

+"""
+Text Preprocessing Module for Training Data.
+Provides utilities for:
+- Text cleaning and normalization
+- Tokenization with various backends
+- Feature extraction for meta-controller training
+"""
+import logging
+import re
+from dataclasses import dataclass
+from typing import Any
+logger = logging.getLogger(__name__)
+@dataclass
+class PreprocessedText:
+    """Preprocessed text with metadata."""
+    original: str
+    cleaned: str
+    tokens: list[str]
+    token_ids: list[int] | None = None
+    features: dict[str, Any] | None = None
+class TextPreprocessor:
+    """
+    Text preprocessing pipeline for multi-agent training data.
+    Handles:
+    - HTML/XML tag removal
+    - Special character normalization
+    - Whitespace cleanup
+    - Domain-specific preprocessing (cyber, military, etc.)
+    """
+    # Patterns for cleaning
+    HTML_TAG_PATTERN = re.compile(r"<[^>]+>")
+    URL_PATTERN = re.compile(r"https?://\S+|www\.\S+")
+    MULTIPLE_SPACES = re.compile(r"\s+")
+    SPECIAL_CHARS = re.compile(r"[^\w\s\-.,!?;:()[\]{}\"'/]")
+    # Domain-specific patterns
+    IP_ADDRESS_PATTERN = re.compile(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b")
+    CVE_PATTERN = re.compile(r"CVE-\d{4}-\d{4,}")
+    MITRE_TECHNIQUE_PATTERN = re.compile(r"T\d{4}(?:\.\d{3})?")
+    def __init__(
+        self,
+        remove_html: bool = True,
+        normalize_urls: bool = True,
+        lowercase: bool = False,
+        preserve_domain_patterns: bool = True,
+    ):
+        """
+        Initialize preprocessor.
+        Args:
+            remove_html: Remove HTML/XML tags
+            normalize_urls: Replace URLs with placeholder
+            lowercase: Convert to lowercase
+            preserve_domain_patterns: Keep domain-specific patterns (IPs, CVEs, etc.)
+        """
+        self.remove_html = remove_html
+        self.normalize_urls = normalize_urls
+        self.lowercase = lowercase
+        self.preserve_domain_patterns = preserve_domain_patterns
+    def clean(self, text: str) -> str:
+        """
+        Clean and normalize text.
+        Args:
+            text: Raw input text
+        Returns:
+            Cleaned text
+        """
+        if not text:
+            return ""
+        result = text
+        # Remove HTML tags
+        if self.remove_html:
+            result = self.HTML_TAG_PATTERN.sub(" ", result)
+        # Preserve or normalize URLs
+        if self.normalize_urls:
+            if self.preserve_domain_patterns:
+                result = self.URL_PATTERN.sub("[URL]", result)
+            else:
+                result = self.URL_PATTERN.sub("", result)
+        # Normalize whitespace
+        result = self.MULTIPLE_SPACES.sub(" ", result)
+        # Lowercase if requested
+        if self.lowercase:
+            result = result.lower()
+        # Strip leading/trailing whitespace
+        result = result.strip()
+        return result
+    def extract_domain_features(self, text: str) -> dict[str, Any]:
+        """
+        Extract domain-specific features from text.
+        Args:
+            text: Input text
+        Returns:
+            Dictionary of extracted features
+        """
+        features = {
+            "has_ip_addresses": bool(self.IP_ADDRESS_PATTERN.search(text)),
+            "ip_count": len(self.IP_ADDRESS_PATTERN.findall(text)),
+            "has_cve": bool(self.CVE_PATTERN.search(text)),
+            "cve_ids": self.CVE_PATTERN.findall(text),
+            "has_mitre_techniques": bool(self.MITRE_TECHNIQUE_PATTERN.search(text)),
+            "mitre_techniques": self.MITRE_TECHNIQUE_PATTERN.findall(text),
+            "text_length": len(text),
+            "word_count": len(text.split()),
+            "sentence_count": len(re.findall(r"[.!?]+", text)),
+        }
+        # Detect domain indicators
+        domain_keywords = {
+            "cybersecurity": ["attack", "vulnerability", "exploit", "malware", "threat"],
+            "military": ["tactical", "reconnaissance", "deployment", "terrain", "objective"],
+            "data_analysis": ["dataset", "analysis", "correlation", "statistics", "visualization"],
+        }
+        for domain, keywords in domain_keywords.items():
+            features[f"is_{domain}"] = any(kw in text.lower() for kw in keywords)
+        return features
+    def preprocess(self, text: str) -> PreprocessedText:
+        """
+        Full preprocessing pipeline.
+        Args:
+            text: Raw input text
+        Returns:
+            PreprocessedText object with all preprocessing results
+        """
+        cleaned = self.clean(text)
+        tokens = cleaned.split()  # Simple whitespace tokenization
+        features = self.extract_domain_features(text)
+        return PreprocessedText(
+            original=text,
+            cleaned=cleaned,
+            tokens=tokens,
+            features=features,
+        )
+    def batch_preprocess(self, texts: list[str]) -> list[PreprocessedText]:
+        """
+        Preprocess multiple texts.
+        Args:
+            texts: List of raw texts
+        Returns:
+            List of PreprocessedText objects
+        """
+        return [self.preprocess(text) for text in texts]
+class TokenizerWrapper:
+    """
+    Wrapper for various tokenization backends.
+    Supports:
+    - Simple whitespace tokenization
+    - HuggingFace tokenizers
+    - Custom vocabularies
+    """
+    def __init__(
+        self,
+        backend: str = "simple",
+        model_name: str | None = None,
+        max_length: int = 512,
+    ):
+        """
+        Initialize tokenizer.
+        Args:
+            backend: Tokenizer backend ('simple', 'huggingface', 'custom')
+            model_name: Model name for HuggingFace tokenizer
+            max_length: Maximum sequence length
+        """
+        self.backend = backend
+        self.model_name = model_name
+        self.max_length = max_length
+        self._tokenizer = None
+        if backend == "huggingface" and model_name:
+            self._load_huggingface_tokenizer()
+    def _load_huggingface_tokenizer(self):
+        """Load HuggingFace tokenizer."""
+        try:
+            from transformers import AutoTokenizer
+            self._tokenizer = AutoTokenizer.from_pretrained(
+                self.model_name,
+                model_max_length=self.max_length,
+            )
+            logger.info(f"Loaded HuggingFace tokenizer: {self.model_name}")
+        except ImportError:
+            logger.error("transformers library not installed. Run: pip install transformers")
+            raise
+    def tokenize(self, text: str) -> tuple[list[str], list[int] | None]:
+        """
+        Tokenize text.
+        Args:
+            text: Input text
+        Returns:
+            Tuple of (tokens, token_ids)
+        """
+        if self.backend == "simple":
+            tokens = text.split()[: self.max_length]
+            return tokens, None
+        elif self.backend == "huggingface" and self._tokenizer:
+            encoded = self._tokenizer(
+                text,
+                truncation=True,
+                max_length=self.max_length,
+                return_tensors=None,
+            )
+            tokens = self._tokenizer.convert_ids_to_tokens(encoded["input_ids"])
+            token_ids = encoded["input_ids"]
+            return tokens, token_ids
+        else:
+            raise ValueError(f"Unsupported backend: {self.backend}")
+    def batch_tokenize(self, texts: list[str]) -> list[tuple[list[str], list[int] | None]]:
+        """
+        Tokenize multiple texts.
+        Args:
+            texts: List of input texts
+        Returns:
+            List of (tokens, token_ids) tuples
+        """
+        return [self.tokenize(text) for text in texts]
+    def encode_for_training(self, texts: list[str]) -> dict[str, Any]:
+        """
+        Encode texts for model training.
+        Args:
+            texts: List of input texts
+        Returns:
+            Dictionary with encoded data ready for training
+        """
+        if self.backend != "huggingface" or not self._tokenizer:
+            raise ValueError("encode_for_training requires HuggingFace backend")
+        encoded = self._tokenizer(
+            texts,
+            truncation=True,
+            padding=True,
+            max_length=self.max_length,
+            return_tensors="pt",
+        )
+        return encoded
+class MetaControllerFeatureExtractor:
+    """
+    Extract features for meta-controller training.
+    Converts text and agent state information into numerical features
+    suitable for RNN/BERT routing decisions.
+    """
+    def __init__(self):
+        """Initialize feature extractor."""
+        self.preprocessor = TextPreprocessor()
+    def extract_query_features(self, query: str) -> dict[str, float]:
+        """
+        Extract numerical features from query text.
+        Args:
+            query: User query text
+        Returns:
+            Dictionary of numerical features
+        """
+        domain_features = self.preprocessor.extract_domain_features(query)
+        features = {
+            "query_length": domain_features["text_length"] / 10000,  # Normalize
+            "word_count": domain_features["word_count"] / 500,
+            "sentence_count": domain_features["sentence_count"] / 50,
+            "has_technical_terms": float(
+                domain_features["has_ip_addresses"]
+                or domain_features["has_cve"]
+                or domain_features["has_mitre_techniques"]
+            ),
+            "is_cybersecurity": float(domain_features["is_cybersecurity"]),
+            "is_military": float(domain_features["is_military"]),
+            "is_data_analysis": float(domain_features["is_data_analysis"]),
+            "complexity_score": self._estimate_complexity(query),
+        }
+        return features
+    def _estimate_complexity(self, text: str) -> float:
+        """
+        Estimate query complexity (0-1 scale).
+        Args:
+            text: Input text
+        Returns:
+            Complexity score
+        """
+        # Simple heuristic based on length, technical terms, etc.
+        score = 0.0
+        # Length factor
+        word_count = len(text.split())
+        if word_count > 50:
+            score += 0.3
+        elif word_count > 20:
+            score += 0.1
+        # Technical term factor
+        technical_indicators = [
+            "analyze",
+            "compare",
+            "evaluate",
+            "synthesize",
+            "strategic",
+            "tactical",
+            "multi-step",
+            "consider",
+        ]
+        for term in technical_indicators:
+            if term in text.lower():
+                score += 0.1
+        # Question complexity
+        if "?" in text:
+            if any(kw in text.lower() for kw in ["why", "how", "what if"]):
+                score += 0.2
+            else:
+                score += 0.1
+        return min(score, 1.0)
+    def extract_agent_state_features(
+        self,
+        hrm_confidence: float = 0.0,
+        trm_confidence: float = 0.0,
+        mcts_iterations: int = 0,
+        consensus_score: float = 0.0,
+        rag_retrieved: int = 0,
+    ) -> list[float]:
+        """
+        Extract features from current agent state.
+        Args:
+            hrm_confidence: HRM agent confidence
+            trm_confidence: TRM agent confidence
+            mcts_iterations: MCTS iterations completed
+            consensus_score: Inter-agent consensus
+            rag_retrieved: Number of RAG documents retrieved
+        Returns:
+            List of normalized features (10-dimensional)
+        """
+        return [
+            hrm_confidence,
+            trm_confidence,
+            min(mcts_iterations / 1000, 1.0),
+            consensus_score,
+            min(rag_retrieved / 20, 1.0),
+            # Derived features
+            abs(hrm_confidence - trm_confidence),  # Disagreement
+            (hrm_confidence + trm_confidence) / 2,  # Average confidence
+            float(mcts_iterations > 0),  # MCTS active
+            float(consensus_score > 0.7),  # High consensus
+            float(rag_retrieved > 0),  # RAG used
+        ]

src/data/tactical_augmentation.py ADDED Viewed

	@@ -0,0 +1,484 @@

+"""
+Tactical Data Augmentation Module.
+Provides domain-specific data augmentation techniques for:
+- Cybersecurity threat scenarios
+- Military tactical situations
+- Multi-step reasoning problems
+These augmentations help increase training data diversity and improve
+model robustness for tactical analysis tasks.
+"""
+import logging
+import random
+from dataclasses import dataclass
+from .dataset_loader import DatasetSample
+logger = logging.getLogger(__name__)
+@dataclass
+class AugmentationResult:
+    """Result of data augmentation."""
+    original: DatasetSample
+    augmented: list[DatasetSample]
+    augmentation_types: list[str]
+class TacticalAugmenter:
+    """
+    Domain-specific data augmentation for tactical analysis.
+    Augmentation techniques:
+    - Paraphrasing tactical scenarios
+    - Varying urgency levels
+    - Adding/removing constraints
+    - Scenario parameter variation
+    - Threat actor substitution
+    - Temporal shifting
+    """
+    # Tactical scenario templates
+    URGENCY_MODIFIERS = {
+        "high": ["IMMEDIATE", "CRITICAL", "URGENT", "TIME-SENSITIVE"],
+        "medium": ["PRIORITY", "IMPORTANT", "ATTENTION REQUIRED"],
+        "low": ["ROUTINE", "STANDARD", "WHEN POSSIBLE"],
+    }
+    THREAT_ACTORS = [
+        "APT28",
+        "APT29",
+        "Lazarus Group",
+        "Cozy Bear",
+        "Fancy Bear",
+        "Unknown Actor",
+        "Nation-State Actor",
+        "Criminal Organization",
+    ]
+    ATTACK_VECTORS = [
+        "phishing",
+        "spear-phishing",
+        "watering hole",
+        "supply chain compromise",
+        "zero-day exploit",
+        "credential stuffing",
+        "brute force",
+        "social engineering",
+    ]
+    MILITARY_OBJECTIVES = [
+        "secure perimeter",
+        "establish forward position",
+        "conduct reconnaissance",
+        "neutralize threat",
+        "protect assets",
+        "maintain operational security",
+        "coordinate with allied forces",
+        "execute tactical withdrawal",
+    ]
+    ENVIRONMENTAL_CONDITIONS = [
+        "night operations",
+        "adverse weather",
+        "limited visibility",
+        "urban terrain",
+        "mountainous region",
+        "coastal area",
+        "contested airspace",
+        "electronic warfare environment",
+    ]
+    def __init__(self, seed: int = 42):
+        """
+        Initialize augmenter.
+        Args:
+            seed: Random seed for reproducibility
+        """
+        self.rng = random.Random(seed)
+        self._augmentation_count = 0
+    def augment_sample(
+        self,
+        sample: DatasetSample,
+        num_augmentations: int = 3,
+        techniques: list[str] | None = None,
+    ) -> AugmentationResult:
+        """
+        Augment a single sample.
+        Args:
+            sample: Original dataset sample
+            num_augmentations: Number of augmented versions to create
+            techniques: Specific techniques to use (None for random selection)
+        Returns:
+            AugmentationResult with augmented samples
+        """
+        available_techniques = [
+            "urgency_variation",
+            "parameter_substitution",
+            "constraint_addition",
+            "temporal_shift",
+            "perspective_change",
+        ]
+        if techniques:
+            available_techniques = [t for t in techniques if t in available_techniques]
+        augmented_samples = []
+        used_techniques = []
+        for _i in range(num_augmentations):
+            technique = self.rng.choice(available_techniques)
+            used_techniques.append(technique)
+            augmented_text = self._apply_technique(sample.text, sample.domain, technique)
+            aug_sample = DatasetSample(
+                id=f"{sample.id}_aug_{self._augmentation_count}",
+                text=augmented_text,
+                metadata={
+                    **sample.metadata,
+                    "augmentation": technique,
+                    "original_id": sample.id,
+                },
+                labels=sample.labels,
+                difficulty=sample.difficulty,
+                domain=sample.domain,
+                reasoning_steps=sample.reasoning_steps,
+            )
+            augmented_samples.append(aug_sample)
+            self._augmentation_count += 1
+        return AugmentationResult(
+            original=sample,
+            augmented=augmented_samples,
+            augmentation_types=used_techniques,
+        )
+    def _apply_technique(self, text: str, domain: str | None, technique: str) -> str:
+        """Apply specific augmentation technique."""
+        if technique == "urgency_variation":
+            return self._augment_urgency(text)
+        elif technique == "parameter_substitution":
+            return self._augment_parameters(text, domain)
+        elif technique == "constraint_addition":
+            return self._augment_constraints(text, domain)
+        elif technique == "temporal_shift":
+            return self._augment_temporal(text)
+        elif technique == "perspective_change":
+            return self._augment_perspective(text, domain)
+        else:
+            return text
+    def _augment_urgency(self, text: str) -> str:
+        """Vary urgency level in the text."""
+        urgency_level = self.rng.choice(list(self.URGENCY_MODIFIERS.keys()))
+        modifier = self.rng.choice(self.URGENCY_MODIFIERS[urgency_level])
+        # Add urgency prefix
+        if urgency_level == "high":
+            return f"[{modifier}] {text}"
+        elif urgency_level == "medium":
+            return f"{modifier}: {text}"
+        else:
+            return f"({modifier}) {text}"
+    def _augment_parameters(self, text: str, domain: str | None) -> str:
+        """Substitute domain-specific parameters."""
+        if domain == "cybersecurity" or "cyber" in text.lower():
+            # Substitute threat actors
+            for actor in self.THREAT_ACTORS:
+                if actor in text:
+                    new_actor = self.rng.choice([a for a in self.THREAT_ACTORS if a != actor])
+                    text = text.replace(actor, new_actor)
+                    break
+            # Substitute attack vectors
+            for vector in self.ATTACK_VECTORS:
+                if vector in text.lower():
+                    new_vector = self.rng.choice([v for v in self.ATTACK_VECTORS if v != vector])
+                    text = text.replace(vector, new_vector)
+                    break
+        elif domain == "military" or any(kw in text.lower() for kw in ["tactical", "military", "reconnaissance"]):
+            # Substitute objectives
+            for obj in self.MILITARY_OBJECTIVES:
+                if obj in text.lower():
+                    new_obj = self.rng.choice([o for o in self.MILITARY_OBJECTIVES if o != obj])
+                    text = text.replace(obj, new_obj)
+                    break
+        return text
+    def _augment_constraints(self, text: str, domain: str | None) -> str:
+        """Add additional constraints to the scenario."""
+        constraints = []
+        if domain == "cybersecurity":
+            constraints = [
+                "with limited network visibility",
+                "under active attack",
+                "with compromised credentials",
+                "during maintenance window",
+                "with restricted access to logs",
+            ]
+        elif domain == "military":
+            constraints = [
+                "with limited ammunition",
+                "under communication blackout",
+                "with reduced personnel",
+                "in contested environment",
+                "with time constraint of 2 hours",
+            ]
+        else:
+            constraints = [
+                "with incomplete information",
+                "under time pressure",
+                "with resource constraints",
+                "considering multiple stakeholders",
+                "with conflicting objectives",
+            ]
+        if constraints:
+            constraint = self.rng.choice(constraints)
+            return f"{text} [{constraint}]"
+        return text
+    def _augment_temporal(self, text: str) -> str:
+        """Shift temporal context."""
+        temporal_contexts = [
+            "In the past 24 hours, ",
+            "Over the next week, ",
+            "Immediately, ",
+            "During the upcoming operation, ",
+            "Following initial assessment, ",
+        ]
+        context = self.rng.choice(temporal_contexts)
+        return f"{context}{text.lower()}" if text else text
+    def _augment_perspective(self, text: str, domain: str | None) -> str:
+        """Change analytical perspective."""
+        perspectives = {
+            "cybersecurity": [
+                "From a threat hunter's perspective: ",
+                "Considering the attacker's viewpoint: ",
+                "For incident response purposes: ",
+                "From a risk management standpoint: ",
+            ],
+            "military": [
+                "From the commander's perspective: ",
+                "Considering enemy capabilities: ",
+                "For tactical planning purposes: ",
+                "From a logistics standpoint: ",
+            ],
+            "default": [
+                "From an analytical perspective: ",
+                "Considering all factors: ",
+                "For decision-making purposes: ",
+                "From a strategic viewpoint: ",
+            ],
+        }
+        domain_perspectives = perspectives.get(domain or "default", perspectives["default"])
+        perspective = self.rng.choice(domain_perspectives)
+        return f"{perspective}{text}"
+    def augment_batch(
+        self,
+        samples: list[DatasetSample],
+        augmentations_per_sample: int = 2,
+    ) -> list[DatasetSample]:
+        """
+        Augment a batch of samples.
+        Args:
+            samples: List of original samples
+            augmentations_per_sample: Number of augmentations per sample
+        Returns:
+            List of all samples (original + augmented)
+        """
+        all_samples = list(samples)  # Keep originals
+        for sample in samples:
+            result = self.augment_sample(sample, num_augmentations=augmentations_per_sample)
+            all_samples.extend(result.augmented)
+        logger.info(
+            f"Augmented {len(samples)} samples to {len(all_samples)} (+{len(all_samples) - len(samples)} augmented)"
+        )
+        return all_samples
+    def create_tactical_scenarios(self, base_samples: list[DatasetSample]) -> list[DatasetSample]:
+        """
+        Create tactical scenario variations from base samples.
+        Combines multiple augmentation techniques to create
+        diverse tactical scenarios for training.
+        Args:
+            base_samples: Base dataset samples
+        Returns:
+            Extended list with tactical scenario variations
+        """
+        scenarios = list(base_samples)
+        for sample in base_samples:
+            # Create high-stakes variant
+            high_stakes = self._augment_urgency(sample.text)
+            high_stakes = self._augment_constraints(high_stakes, sample.domain)
+            scenarios.append(
+                DatasetSample(
+                    id=f"{sample.id}_highstakes_{self._augmentation_count}",
+                    text=high_stakes,
+                    metadata={
+                        **sample.metadata,
+                        "scenario_type": "high_stakes",
+                        "original_id": sample.id,
+                    },
+                    labels=sample.labels,
+                    difficulty="hard",  # High stakes scenarios are harder
+                    domain=sample.domain,
+                    reasoning_steps=sample.reasoning_steps,
+                )
+            )
+            self._augmentation_count += 1
+            # Create multi-perspective variant
+            if self.rng.random() > 0.5:
+                multi_perspective = self._augment_perspective(sample.text, sample.domain)
+                scenarios.append(
+                    DatasetSample(
+                        id=f"{sample.id}_multiperspective_{self._augmentation_count}",
+                        text=multi_perspective,
+                        metadata={
+                            **sample.metadata,
+                            "scenario_type": "multi_perspective",
+                            "original_id": sample.id,
+                        },
+                        labels=sample.labels,
+                        difficulty=sample.difficulty,
+                        domain=sample.domain,
+                        reasoning_steps=sample.reasoning_steps,
+                    )
+                )
+                self._augmentation_count += 1
+        logger.info(f"Created {len(scenarios) - len(base_samples)} tactical scenarios")
+        return scenarios
+class CyberSecurityAugmenter(TacticalAugmenter):
+    """
+    Specialized augmenter for cybersecurity scenarios.
+    Focuses on:
+    - MITRE ATT&CK technique variations
+    - Threat intelligence context
+    - Incident response scenarios
+    """
+    MITRE_TACTICS = [
+        "Initial Access",
+        "Execution",
+        "Persistence",
+        "Privilege Escalation",
+        "Defense Evasion",
+        "Credential Access",
+        "Discovery",
+        "Lateral Movement",
+        "Collection",
+        "Exfiltration",
+        "Impact",
+    ]
+    SEVERITY_LEVELS = ["LOW", "MEDIUM", "HIGH", "CRITICAL"]
+    def augment_with_mitre_context(self, sample: DatasetSample) -> DatasetSample:
+        """
+        Add MITRE ATT&CK context to sample.
+        Args:
+            sample: Original sample
+        Returns:
+            Augmented sample with MITRE context
+        """
+        tactic = self.rng.choice(self.MITRE_TACTICS)
+        severity = self.rng.choice(self.SEVERITY_LEVELS)
+        augmented_text = f"[MITRE ATT&CK: {tactic}] [Severity: {severity}] {sample.text}"
+        return DatasetSample(
+            id=f"{sample.id}_mitre_{self._augmentation_count}",
+            text=augmented_text,
+            metadata={
+                **sample.metadata,
+                "mitre_tactic": tactic,
+                "severity": severity,
+            },
+            labels=sample.labels,
+            difficulty=sample.difficulty,
+            domain="cybersecurity",
+            reasoning_steps=sample.reasoning_steps,
+        )
+class MilitaryTacticalAugmenter(TacticalAugmenter):
+    """
+    Specialized augmenter for military tactical scenarios.
+    Focuses on:
+    - Environmental condition variations
+    - Force composition changes
+    - Mission objective variations
+    """
+    FORCE_COMPOSITIONS = [
+        "infantry platoon",
+        "mechanized company",
+        "special operations team",
+        "combined arms battalion",
+        "air assault element",
+    ]
+    def augment_with_force_composition(self, sample: DatasetSample) -> DatasetSample:
+        """
+        Add force composition context to sample.
+        Args:
+            sample: Original sample
+        Returns:
+            Augmented sample with force composition
+        """
+        force = self.rng.choice(self.FORCE_COMPOSITIONS)
+        condition = self.rng.choice(self.ENVIRONMENTAL_CONDITIONS)
+        augmented_text = f"[Force: {force}] [Conditions: {condition}] {sample.text}"
+        return DatasetSample(
+            id=f"{sample.id}_tactical_{self._augmentation_count}",
+            text=augmented_text,
+            metadata={
+                **sample.metadata,
+                "force_composition": force,
+                "environmental_conditions": condition,
+            },
+            labels=sample.labels,
+            difficulty=sample.difficulty,
+            domain="military",
+            reasoning_steps=sample.reasoning_steps,
+        )

src/data/train_test_split.py ADDED Viewed

	@@ -0,0 +1,505 @@

+"""
+Data Splitting Module for Training Pipeline.
+Provides utilities for:
+- Train/validation/test splitting
+- Stratified sampling by domain or difficulty
+- Cross-validation fold creation
+- Reproducible splits with seeding
+"""
+import logging
+from collections import defaultdict
+from dataclasses import dataclass
+from typing import Any
+from .dataset_loader import DatasetSample
+logger = logging.getLogger(__name__)
+@dataclass
+class DataSplit:
+    """Result of dataset splitting."""
+    train: list[DatasetSample]
+    validation: list[DatasetSample]
+    test: list[DatasetSample]
+    split_info: dict[str, Any]
+@dataclass
+class CrossValidationFold:
+    """Single fold for cross-validation."""
+    fold_id: int
+    train: list[DatasetSample]
+    validation: list[DatasetSample]
+class DataSplitter:
+    """
+    Basic dataset splitter with random sampling.
+    Provides reproducible train/validation/test splits
+    with configurable ratios.
+    """
+    def __init__(self, seed: int = 42):
+        """
+        Initialize splitter.
+        Args:
+            seed: Random seed for reproducibility
+        """
+        self.seed = seed
+        import random
+        self.rng = random.Random(seed)
+    def split(
+        self,
+        samples: list[DatasetSample],
+        train_ratio: float = 0.7,
+        val_ratio: float = 0.15,
+        test_ratio: float = 0.15,
+        shuffle: bool = True,
+    ) -> DataSplit:
+        """
+        Split dataset into train/validation/test sets.
+        Args:
+            samples: List of all samples
+            train_ratio: Proportion for training (default 0.7)
+            val_ratio: Proportion for validation (default 0.15)
+            test_ratio: Proportion for testing (default 0.15)
+            shuffle: Whether to shuffle before splitting
+        Returns:
+            DataSplit with train, validation, and test sets
+        """
+        if abs(train_ratio + val_ratio + test_ratio - 1.0) > 0.001:
+            raise ValueError("Ratios must sum to 1.0")
+        if not samples:
+            raise ValueError("Cannot split empty sample list")
+        # Copy and optionally shuffle
+        all_samples = list(samples)
+        if shuffle:
+            self.rng.shuffle(all_samples)
+        n = len(all_samples)
+        train_end = int(n * train_ratio)
+        val_end = train_end + int(n * val_ratio)
+        train_samples = all_samples[:train_end]
+        val_samples = all_samples[train_end:val_end]
+        test_samples = all_samples[val_end:]
+        split_info = {
+            "total_samples": n,
+            "train_samples": len(train_samples),
+            "val_samples": len(val_samples),
+            "test_samples": len(test_samples),
+            "train_ratio": len(train_samples) / n,
+            "val_ratio": len(val_samples) / n,
+            "test_ratio": len(test_samples) / n,
+            "seed": self.seed,
+            "shuffled": shuffle,
+        }
+        logger.info(f"Split {n} samples: train={len(train_samples)}, val={len(val_samples)}, test={len(test_samples)}")
+        return DataSplit(
+            train=train_samples,
+            validation=val_samples,
+            test=test_samples,
+            split_info=split_info,
+        )
+    def create_k_folds(
+        self,
+        samples: list[DatasetSample],
+        k: int = 5,
+        shuffle: bool = True,
+    ) -> list[CrossValidationFold]:
+        """
+        Create k-fold cross-validation splits.
+        Args:
+            samples: List of all samples
+            k: Number of folds
+            shuffle: Whether to shuffle before splitting
+        Returns:
+            List of CrossValidationFold objects
+        """
+        if k < 2:
+            raise ValueError("k must be at least 2")
+        if len(samples) < k:
+            raise ValueError(f"Need at least {k} samples for {k}-fold CV")
+        # Copy and optionally shuffle
+        all_samples = list(samples)
+        if shuffle:
+            self.rng.shuffle(all_samples)
+        # Calculate fold sizes
+        fold_size = len(all_samples) // k
+        folds = []
+        for fold_id in range(k):
+            # Validation is the current fold
+            val_start = fold_id * fold_size
+            val_end = len(all_samples) if fold_id == k - 1 else val_start + fold_size  # noqa: SIM108
+            val_samples = all_samples[val_start:val_end]
+            train_samples = all_samples[:val_start] + all_samples[val_end:]
+            folds.append(
+                CrossValidationFold(
+                    fold_id=fold_id,
+                    train=train_samples,
+                    validation=val_samples,
+                )
+            )
+        logger.info(f"Created {k}-fold cross-validation splits")
+        return folds
+class StratifiedSplitter(DataSplitter):
+    """
+    Stratified dataset splitter.
+    Ensures proportional representation of categories
+    (domain, difficulty, etc.) across splits.
+    """
+    def __init__(self, seed: int = 42, stratify_by: str = "domain"):
+        """
+        Initialize stratified splitter.
+        Args:
+            seed: Random seed for reproducibility
+            stratify_by: Attribute to stratify on ('domain', 'difficulty', 'labels')
+        """
+        super().__init__(seed)
+        self.stratify_by = stratify_by
+    def split(
+        self,
+        samples: list[DatasetSample],
+        train_ratio: float = 0.7,
+        val_ratio: float = 0.15,
+        test_ratio: float = 0.15,
+        shuffle: bool = True,
+    ) -> DataSplit:
+        """
+        Stratified split maintaining category proportions.
+        Args:
+            samples: List of all samples
+            train_ratio: Proportion for training
+            val_ratio: Proportion for validation
+            test_ratio: Proportion for testing
+            shuffle: Whether to shuffle before splitting
+        Returns:
+            DataSplit with stratified train, validation, and test sets
+        """
+        if abs(train_ratio + val_ratio + test_ratio - 1.0) > 0.001:
+            raise ValueError("Ratios must sum to 1.0")
+        if not samples:
+            raise ValueError("Cannot split empty sample list")
+        # Group samples by stratification key
+        groups = defaultdict(list)
+        for sample in samples:
+            key = self._get_stratify_key(sample)
+            groups[key].append(sample)
+        # Split each group proportionally
+        train_samples = []
+        val_samples = []
+        test_samples = []
+        for _key, group_samples in groups.items():
+            if shuffle:
+                self.rng.shuffle(group_samples)
+            n = len(group_samples)
+            train_end = int(n * train_ratio)
+            val_end = train_end + int(n * val_ratio)
+            train_samples.extend(group_samples[:train_end])
+            val_samples.extend(group_samples[train_end:val_end])
+            test_samples.extend(group_samples[val_end:])
+        # Final shuffle of combined sets
+        if shuffle:
+            self.rng.shuffle(train_samples)
+            self.rng.shuffle(val_samples)
+            self.rng.shuffle(test_samples)
+        # Verify stratification
+        stratify_info = self._verify_stratification(train_samples, val_samples, test_samples)
+        split_info = {
+            "total_samples": len(samples),
+            "train_samples": len(train_samples),
+            "val_samples": len(val_samples),
+            "test_samples": len(test_samples),
+            "train_ratio": len(train_samples) / len(samples),
+            "val_ratio": len(val_samples) / len(samples),
+            "test_ratio": len(test_samples) / len(samples),
+            "stratify_by": self.stratify_by,
+            "stratification_info": stratify_info,
+            "seed": self.seed,
+            "shuffled": shuffle,
+        }
+        logger.info(
+            f"Stratified split ({self.stratify_by}): "
+            f"train={len(train_samples)}, val={len(val_samples)}, "
+            f"test={len(test_samples)}"
+        )
+        return DataSplit(
+            train=train_samples,
+            validation=val_samples,
+            test=test_samples,
+            split_info=split_info,
+        )
+    def _get_stratify_key(self, sample: DatasetSample) -> str:
+        """Get stratification key for a sample."""
+        if self.stratify_by == "domain":
+            return sample.domain or "unknown"
+        elif self.stratify_by == "difficulty":
+            return sample.difficulty or "unknown"
+        elif self.stratify_by == "labels":
+            return ",".join(sorted(sample.labels)) if sample.labels else "unknown"
+        else:
+            return str(getattr(sample, self.stratify_by, "unknown"))
+    def _verify_stratification(
+        self,
+        train: list[DatasetSample],
+        val: list[DatasetSample],
+        test: list[DatasetSample],
+    ) -> dict[str, dict[str, float]]:
+        """
+        Verify that stratification was successful.
+        Returns dictionary showing distribution of stratification key
+        across train/val/test splits.
+        """
+        def get_distribution(samples: list[DatasetSample]) -> dict[str, float]:
+            if not samples:
+                return {}
+            counts = defaultdict(int)
+            for sample in samples:
+                key = self._get_stratify_key(sample)
+                counts[key] += 1
+            total = len(samples)
+            return {k: v / total for k, v in counts.items()}
+        return {
+            "train": get_distribution(train),
+            "validation": get_distribution(val),
+            "test": get_distribution(test),
+        }
+    def create_stratified_k_folds(
+        self,
+        samples: list[DatasetSample],
+        k: int = 5,
+        shuffle: bool = True,
+    ) -> list[CrossValidationFold]:
+        """
+        Create stratified k-fold cross-validation splits.
+        Args:
+            samples: List of all samples
+            k: Number of folds
+            shuffle: Whether to shuffle before splitting
+        Returns:
+            List of CrossValidationFold objects with stratification
+        """
+        if k < 2:
+            raise ValueError("k must be at least 2")
+        # Group samples by stratification key
+        groups = defaultdict(list)
+        for sample in samples:
+            key = self._get_stratify_key(sample)
+            groups[key].append(sample)
+        # Initialize folds
+        folds_data = [{"train": [], "val": []} for _ in range(k)]
+        # Distribute each group across folds
+        for _key, group_samples in groups.items():
+            if shuffle:
+                self.rng.shuffle(group_samples)
+            # Assign samples to folds
+            fold_size = len(group_samples) // k
+            for fold_id in range(k):
+                val_start = fold_id * fold_size
+                val_end = len(group_samples) if fold_id == k - 1 else val_start + fold_size
+                for i, sample in enumerate(group_samples):
+                    if val_start <= i < val_end:
+                        folds_data[fold_id]["val"].append(sample)
+                    else:
+                        folds_data[fold_id]["train"].append(sample)
+        # Create fold objects
+        folds = [
+            CrossValidationFold(
+                fold_id=i,
+                train=data["train"],
+                validation=data["val"],
+            )
+            for i, data in enumerate(folds_data)
+        ]
+        logger.info(f"Created stratified {k}-fold cross-validation splits")
+        return folds
+class BalancedSampler:
+    """
+    Balanced sampling for imbalanced datasets.
+    Provides utilities for:
+    - Oversampling minority classes
+    - Undersampling majority classes
+    - SMOTE-like synthetic sampling (for numerical features)
+    """
+    def __init__(self, seed: int = 42):
+        """Initialize balanced sampler."""
+        self.seed = seed
+        import random
+        self.rng = random.Random(seed)
+    def oversample_minority(
+        self,
+        samples: list[DatasetSample],
+        target_key: str = "domain",
+        target_ratio: float = 1.0,
+    ) -> list[DatasetSample]:
+        """
+        Oversample minority classes to balance dataset.
+        Args:
+            samples: Original samples
+            target_key: Attribute to balance on
+            target_ratio: Target ratio relative to majority (1.0 = equal)
+        Returns:
+            Balanced sample list (originals + oversampled)
+        """
+        # Group by target key
+        groups = defaultdict(list)
+        for sample in samples:
+            key = getattr(sample, target_key, "unknown") or "unknown"
+            groups[key].append(sample)
+        # Find majority class size
+        max_count = max(len(g) for g in groups.values())
+        target_count = int(max_count * target_ratio)
+        # Oversample minority classes
+        balanced = []
+        for _key, group in groups.items():
+            balanced.extend(group)
+            # Oversample if needed
+            if len(group) < target_count:
+                num_to_add = target_count - len(group)
+                for _ in range(num_to_add):
+                    # Randomly duplicate from group
+                    original = self.rng.choice(group)
+                    duplicate = DatasetSample(
+                        id=f"{original.id}_oversample_{self.rng.randint(0, 999999)}",
+                        text=original.text,
+                        metadata={**original.metadata, "oversampled": True},
+                        labels=original.labels,
+                        difficulty=original.difficulty,
+                        domain=original.domain,
+                        reasoning_steps=original.reasoning_steps,
+                    )
+                    balanced.append(duplicate)
+        logger.info(f"Oversampled from {len(samples)} to {len(balanced)} samples")
+        return balanced
+    def undersample_majority(
+        self,
+        samples: list[DatasetSample],
+        target_key: str = "domain",
+        target_ratio: float = 1.0,
+    ) -> list[DatasetSample]:
+        """
+        Undersample majority classes to balance dataset.
+        Args:
+            samples: Original samples
+            target_key: Attribute to balance on
+            target_ratio: Target ratio relative to minority (1.0 = equal)
+        Returns:
+            Balanced sample list (subset of originals)
+        """
+        # Group by target key
+        groups = defaultdict(list)
+        for sample in samples:
+            key = getattr(sample, target_key, "unknown") or "unknown"
+            groups[key].append(sample)
+        # Find minority class size
+        min_count = min(len(g) for g in groups.values())
+        target_count = int(min_count * target_ratio)
+        # Undersample majority classes
+        balanced = []
+        for _key, group in groups.items():
+            if len(group) > target_count:
+                # Randomly select target_count samples
+                balanced.extend(self.rng.sample(group, target_count))
+            else:
+                balanced.extend(group)
+        logger.info(f"Undersampled from {len(samples)} to {len(balanced)} samples")
+        return balanced
+    def get_class_distribution(
+        self,
+        samples: list[DatasetSample],
+        target_key: str = "domain",
+    ) -> dict[str, int]:
+        """
+        Get distribution of classes.
+        Args:
+            samples: Sample list
+            target_key: Attribute to analyze
+        Returns:
+            Dictionary of class counts
+        """
+        distribution = defaultdict(int)
+        for sample in samples:
+            key = getattr(sample, target_key, "unknown") or "unknown"
+            distribution[key] += 1
+        return dict(distribution)

src/framework/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Framework module

src/framework/agents/__init__.py ADDED Viewed

	@@ -0,0 +1,22 @@

+# Agents module for async agent implementations
+from .base import (
+    AgentContext,
+    AgentResult,
+    AsyncAgentBase,
+    CompositeAgent,
+    MetricsCollector,
+    NoOpMetricsCollector,
+    ParallelAgent,
+    SequentialAgent,
+)
+__all__ = [
+    "AsyncAgentBase",
+    "AgentContext",
+    "AgentResult",
+    "MetricsCollector",
+    "NoOpMetricsCollector",
+    "CompositeAgent",
+    "ParallelAgent",
+    "SequentialAgent",
+]