Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published 16 days ago • 15
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 134
Attention Is All You Need for KV Cache in Diffusion LLMs Paper • 2510.14973 • Published Oct 16, 2025 • 40
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees Paper • 2506.14606 • Published Jun 17, 2025 • 11
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published Jun 5, 2025 • 24
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated 1 day ago • 350
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark Paper • 2503.20786 • Published Mar 26, 2025 • 2
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark Paper • 2505.16968 • Published May 22, 2025 • 40
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30, 2025 • 80
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem Paper • 2505.21887 • Published May 28, 2025 • 14
CASS Collection Large-scale dataset and model suite for cross-architecture GPU code transpilation between CUDA and HIP at both source and assembly levels • 2 items • Updated May 15, 2025 • 5
SALT: Singular Value Adaptation with Low-Rank Transformation Paper • 2503.16055 • Published Mar 20, 2025 • 8
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding Paper • 2502.14949 • Published Feb 20, 2025 • 9
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13, 2025 • 148