Runze Liu's picture

5 20 4

Runze Liu

RyanLiu112

·

https://ryanliu112.github.io

AI & ML interests

LLM, RL

Recent Activity

upvoted an article 6 days ago

Deriving the PPO Loss from First Principles

upvoted a paper 9 days ago

Step-DeepResearch Technical Report

upvoted a collection 10 days ago

Physics of Language Models: Part 4.2

View all activity

Organizations

upvoted an article 6 days ago

Article

Deriving the PPO Loss from First Principles

9 days ago

•

32

upvoted a paper 9 days ago

Step-DeepResearch Technical Report

Paper • 2512.20491 • Published 11 days ago • 77

upvoted a collection 10 days ago

Physics of Language Models: Part 4.2

17 items • Updated 12 days ago • 2

upvoted a paper 10 days ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published 12 days ago • 60

upvoted a collection 10 days ago

"Physics of Language Models" series

7 items • Updated 12 days ago • 52

upvoted a paper 10 days ago

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 26 days ago • 36

upvoted a collection 3 months ago

Archer2.0

5 items • Updated Oct 8, 2025 • 1

upvoted 3 papers 3 months ago

ASPO: Asymmetric Importance Sampling Policy Optimization

Paper • 2510.06062 • Published Oct 7, 2025 • 13

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 16

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

Paper • 2509.24981 • Published Sep 29, 2025 • 29

upvoted a paper 4 months ago

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190

upvoted 2 papers 5 months ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14, 2025 • 97

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50

upvoted a paper 6 months ago

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Paper • 2507.15778 • Published Jul 21, 2025 • 20

upvoted a paper 7 months ago

Scaling Image and Video Generation via Test-Time Evolutionary Search

Paper • 2505.17618 • Published May 23, 2025 • 41

upvoted a paper 9 months ago

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Paper • 2504.00891 • Published Apr 1, 2025 • 14

upvoted a collection 9 months ago

GenPRM

A collection of GenPRM. Project page: https://ryanliu112.github.io/GenPRM • 6 items • Updated Apr 6, 2025 • 5

upvoted 2 collections 11 months ago

CodeI/O

Collection for CodeI/O @ https://codei-o.github.io/ • 16 items • Updated May 6, 2025 • 7

VersaPRM

Collection of VersaPRMs using various training configurations • 8 items • Updated Feb 8, 2025 • 1

upvoted a paper 11 months ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10, 2025 • 153