yenson-lau
's Collections
Papers
updated
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
•
2506.06395
•
Published
•
133
Paper
•
2506.10910
•
Published
•
66
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path
Lengths in LLMs
Paper
•
2506.07240
•
Published
•
7
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
•
2506.09991
•
Published
•
55
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
The Illusion of Thinking: Understanding the Strengths and Limitations of
Reasoning Models via the Lens of Problem Complexity
Paper
•
2506.06941
•
Published
•
15
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
•
2505.14146
•
Published
•
19
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper
•
2506.11763
•
Published
•
73
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes
Correct Reasoning in Base LLMs
Paper
•
2506.14245
•
Published
•
45
Reasoning with Exploration: An Entropy Perspective
Paper
•
2506.14758
•
Published
•
30
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
•
2506.24119
•
Published
•
50
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
•
2507.00432
•
Published
•
79
Promptomatix: An Automatic Prompt Optimization Framework for Large
Language Models
Paper
•
2507.14241
•
Published
•
17
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
•
2507.15061
•
Published
•
60
Paper
•
2505.09388
•
Published
•
321
Replacing thinking with tool usage enables reasoning in small language
models
Paper
•
2507.05065
•
Published
•
15
Inverse Reinforcement Learning Meets Large Language Model Post-Training:
Basics, Advances, and Opportunities
Paper
•
2507.13158
•
Published
•
23
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
67
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
•
2508.10751
•
Published
•
28
MCP-Universe: Benchmarking Large Language Models with Real-World Model
Context Protocol Servers
Paper
•
2508.14704
•
Published
•
43
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
90
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
•
2508.16153
•
Published
•
160
AgentScope 1.0: A Developer-Centric Framework for Building Agentic
Applications
Paper
•
2508.16279
•
Published
•
53
InMind: Evaluating LLMs in Capturing and Applying Individual Human
Reasoning Styles
Paper
•
2508.16072
•
Published
•
4
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Paper
•
2508.18076
•
Published
•
6
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the
effect of Epistemic Markers on LLM-based Evaluation
Paper
•
2410.20774
•
Published
Provable Benefits of In-Tool Learning for Large Language Models
Paper
•
2508.20755
•
Published
•
11