On scalable oversight with weak LLMs judging strong LLMs Paper • 2407.04622 • Published Jul 5, 2024 • 15
Evaluating Frontier Models for Dangerous Capabilities Paper • 2403.13793 • Published Mar 20, 2024 • 7
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning Paper • 2310.12921 • Published Oct 19, 2023 • 19