Open Source Repository

🔗 GitHub: https://github.com/RLsys-Foundation/TritonForge

Contributors

Jin Pan, Xiang Long, Chengxing Xie, Kexun Zhang, Haoran Wang, Junrong Lin, Yuzhen Zhou, Jiajun Li, Yang Wang, Xiaodong Yu, Gowtham Ramesh, Yusheng Su, Zicheng Liu, Emad Barsoum

1. TL;DR


TritonForge is a Server-based RL training and evaluation closed-loop system designed for multi-turn Agent tasks, built on the slime (SGLang-native) + Megatron foundation. It focuses on Triton kernel generation with stable and scalable practices across both NVIDIA and AMD ecosystems. The design goal is to transform "the instability of multi-turn RL in real-world environments" into implementable, scalable, and maintainable system capabilities.

Regarding methodology and task design, we draw inspiration from Kevin (multi-turn RL for generating CUDA kernels) and KernelBench (kernel correctness and performance evaluation benchmark)—representing the multi-turn RL training paradigm and engineering evaluation standards, respectively.

Screenshot 2025-09-29 at 8.29.25 PM.png


2. Technical Choices

2.1 Why Slime?(From verl → slime)

Where we started

We initially planned to build the full multi-turn RL pipeline on veRL: