Stephen Xie

stephenx [at] berkeley [dot] edu

I am a researcher at UC Berkeley and BAIR Natural Language Processing (NLP) Group , where I'm fortunate to be advised by Jiayi Pan and Alane Suhr. I'm interested in understanding and maximizing the capabilities of language models. My research primarily focuses on efficiently training robust reasoning models through RL. Previously, I worked on mechanistic interpretability and LLM steering at CTGT (YC F24). I am double majoring in Electrical Engineering and Computer Sciences (EECS) and Business Administration under the M.E.T. program.

Research(Selected / All )

OpenHands Runtime Adaptation and Systems Optimization for Konwinski Prize
Stephen Xie, Aryan Bansal, Xingyao Wang, Jiayi Pan
Adapted the OpenHands software engineering agent using a state-of-the-art 32B parameter model trained on SWE-Gym dataset for submission to the Konwinski Prize competition. Optimized the LocalRuntime environment to operate under constrained conditions including limited bandwidth, restricted dependencies, and strict time limits required for remote competition submission.
GitHub
Konwinski Prize Competition @ Kaggle 2025
LoRA Support for veRL for Memory-Efficient RL Training
Simon Huang, Stephen Xie, Jiayi Pan, Tony Lian, Chi Zhang
Upgraded Volcano Engine Reinforcement Learning (veRL) framework's PPO trainer to be compatible with Low-rank Adaptation (LoRA) tuners to enable training 70B models on 8 GPUs.
XGitHub
Open-source Contribution 2025
Steering Vector Plugin for vLLM to Align Models at Inference Time
Stephen Xie, CTGT Team
Developed a vLLM plugin to accept custom steering vectors (extracted via difference in means) and add scaled versions to the model's internal activations to align model towards desired behavior without additional prompting or retraining.
Work done at CTGT, Inc.

Projects(Selected / All )

Double: Never ghost anyone again @ Neo Hackathon 2025
Built a data pipeline extracting conversation history from iMessage, Instagram, and Discord, then SFTed models with LoRA + GEPA prompt optimization. Used GraphRAG/Neo4j for memory retrieval and auto-respond to messages via Beeper API.
Demo350K of my messages on HFGitHubDevpost
Ghostwriter @ 2025
Optimized system prompts to bypass AI detectors like Ghostbuster (Verma et al., 2023) using GEPA, discovering that the algorithm explicitly learns to avoid telltale AI patterns like em dashes and "X isn't just about Y, it's about Z". However, prompts can end up optimizing for gibberish or elementary school level writing; more sophisticated reward function is needed to preserve writing quality.
XGitHub
Sidequest @ Pear x Anthropic Hackathon 2025
Built an interface where AI assigns you real-world tasks and monitors your progress through a live video feed. Streamed video to a vision-language model to watch what you're doing and give you instructions and evaluations step-by-step; basically reversing the usual setup so the AI is prompting you instead of you prompting the AI.
XGitHubDemoPear Article