SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
Abstract
SWE-Lego achieves state-of-the-art performance in software engineering task resolution through a lightweight supervised fine-tuning approach combined with a curated dataset and refined training procedures.
We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods that rely on complex training paradigms (e.g., mid-training, SFT, reinforcement learning, and their combinations), we explore how to push the limits of a lightweight SFT-only approach for SWE tasks. SWE-Lego comprises three core building blocks, with key findings summarized as follows: 1) the SWE-Lego dataset, a collection of 32k highquality task instances and 18k validated trajectories, combining real and synthetic data to complement each other in both quality and quantity; 2) a refined SFT procedure with error masking and a difficulty-based curriculum, which demonstrably improves action quality and overall performance. Empirical results show that with these two building bricks alone,the SFT can push SWE-Lego models to state-of-the-art performance among open-source models of comparable size on SWE-bench Verified: SWE-Lego-Qwen3-8B reaches 42.2%, and SWE-Lego-Qwen3-32B attains 52.6%. 3) We further evaluate and improve test-time scaling (TTS) built upon the SFT foundation. Based on a well-trained verifier, SWE-Lego models can be significantly boosted--for example, 42.2% to 49.6% and 52.6% to 58.8% under TTS@16 for the 8B and 32B models, respectively.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Training Versatile Coding Agents in Synthetic Environments (2025)
- SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios (2025)
- SWE-RM: Execution-free Feedback For Software Engineering Agents (2025)
- Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling (2025)
- One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents (2025)
- Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models (2025)
- OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper

