Current Setup
Model Configuration
- Base Model: KernelLLM (Llama-3.1-8B-Instruct)
- Fine-tuned: SFT on GPUmode data
- Task: Converting PyTorch code to Triton code
- Training Method: RL post-training
- Hardware: Single H100 node
Experiment Type
- Mode: Single-turn code conversion
- Evaluation Metrics:
- Compilation success
- Correctness
- Performance
Prompt Data
Mix up KernelBench Level 1 - Level 4: 270 cases
RL Framework Integration
- RL Framework: Slime
- Evaluation Sandbox: KernelBench
- Integration Status: ✅ Successfully integrated and working
Experiment Results
Key Findings
- Reward Improvement: ~5% increase observed
- Reward Behavior: Some fluctuation observed, but overall upward trend