Current Setup

Model Configuration

Base Model: KernelLLM (Llama-3.1-8B-Instruct)
Fine-tuned: SFT on GPUmode data
Task: Converting PyTorch code to Triton code
Training Method: RL post-training
Hardware: Single H100 node

Experiment Type

Mode: Single-turn code conversion
Evaluation Metrics:
- Compilation success
- Correctness
- Performance

Prompt Data

Mix up KernelBench Level 1 - Level 4: 270 cases

RL Framework Integration

RL Framework: Slime
Evaluation Sandbox: KernelBench
Integration Status: ✅ Successfully integrated and working

Experiment Results

Key Findings

Reward Improvement: ~5% increase observed
Reward Behavior: Some fluctuation observed, but overall upward trend