XIANZHEN LUO
EDUCATION
Harbin Institute of Technology
Harbin, China
Ph.D. Student in Code LLMs
Sep 2022 – Present
Supervisor: Prof. Wanxiang Che and Assoc. Prof. Qingfu Zhu
- Honors: National Scholarship, ACL 2025 Outstanding Paper
Harbin Engineering University
Harbin, China
Bachelor in Computer Science
Sep 2018 – Jun 2022
- Honors: National Scholarship, 2 x ICPC Regional Silver Medals, 2 x ICPC EC-Final Bronze Medals.
EXPERIENCE
Kuaishou Technology
Beijing, China
Kstar Research Intern
Aug 2025 – Mar 2026
- Developed a Multi-Agent Framework (CVE-Factory) from scratch to scale terminal environments for code security. Fully automated collection, construction, testing, and evaluation without human intervention.
- Cross-validated with expert manual reproduction, achieving over 95% consistency. Supports asynchronous parallelism, generating 215 tasks in under 5 hours with 20 parallel on a single machine (vs. 10 hours per task for experts).
- Scaled to 4k+ high-quality vulnerability repair tasks. The performance of finetuned Qwen3-32B improved 5x, is comparable to Minimax-M2.7 and Claude Sonnet 4. Demonstrated strong generalization on TerminalBench and cross-language settings.
StepFun AI
Beijing, China
Pretrain Research Intern
Dec 2024 – Jul 2025
- Developed and cleaned full-scale GitHub file & issue data from scratch. Further refined filter rules to remove garbled text, significantly reducing loss spikes.
- Conducted experiments on code data with FIM, Meta Info, MTP, and Focal Loss strategies.
- Explored and validated Scaling Laws for Code. Under certain constraints, prediction error was less than 0.1%.
- Discovered a logarithmic relationship between code compression ratio and downstream code task performance.
Du Xiaoman (Beijing) Science Technology Co., Ltd.
Beijing, China
University-Industry Collaboration Researcher
Nov 2023 – Sep 2024
- Synthesized code SFT data from the perspectives of correctness and detail sensitivity, achieving SOTA on small LLMs.
- Leveraged multilingual programming languages to assist reasoning, yielding consistent improvements across benchmarks and LLMs.
- Achieved 2x lossless acceleration with only 2MB extra storage by exploiting vocabulary distributions from LLM decoding, outperforming prior train-free speculative decoding methods by over 30%.
FIRST-AUTHOR PUBLICATIONS
Arxiv
CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability
Xianzhen Luo*, Jingyuan Zhang*, Shiqi Zhou*, Rain Huang*, Chuan Xiao, Qingfu Zhu, Zhiyuan Ma, Xing Yue, Yang Yue, Wencong Zeng, Wanxiang Che
Xianzhen Luo*, Jingyuan Zhang*, Shiqi Zhou*, Rain Huang*, Chuan Xiao, Qingfu Zhu, Zhiyuan Ma, Xing Yue, Yang Yue, Wencong Zeng, Wanxiang Che
ACL 2026
Scaling Laws for Code: A More Data-Hungry Regime
Xianzhen Luo*, Wenzhen Zheng*, Qingfu Zhu, Rongyi Zhang, Houyi Li, Siming Huang, Yuantao Fan, Wanxiang Che
Xianzhen Luo*, Wenzhen Zheng*, Qingfu Zhu, Rongyi Zhang, Houyi Li, Siming Huang, Yuantao Fan, Wanxiang Che
ICLR 2026
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Xianzhen Luo*, Jinyang Huang*, Wenzhen Zheng, Qingfu Zhu, Mingzheng Xu, Yiheng Xu, Yuantao Fan, Libo Qin, Wanxiang Che
Xianzhen Luo*, Jinyang Huang*, Wenzhen Zheng, Qingfu Zhu, Mingzheng Xu, Yiheng Xu, Yuantao Fan, Libo Qin, Wanxiang Che
ACL 2025 Outstanding Paper
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Xianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang, Dongliang Xu
Xianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang, Dongliang Xu
ACL 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Xuanle Zhao*, Xianzhen Luo*, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun
Xuanle Zhao*, Xianzhen Luo*, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun
EMNLP 2024
Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Libo Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Libo Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che
EMNLP 2024
Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training
Yixuan Wang*, Xianzhen Luo*, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che
Yixuan Wang*, Xianzhen Luo*, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che
Arxiv
Is Compression Really Linear with Code Intelligence?
Shijie Xuyang*, Xianzhen Luo*, Zheng Chu, Houyi Li, Siming Huang, Qiufeng Wang, Wanxiang Che, Qingfu Zhu, Shuigeng Zhou
Shijie Xuyang*, Xianzhen Luo*, Zheng Chu, Houyi Li, Siming Huang, Qiufeng Wang, Wanxiang Che, Qingfu Zhu, Shuigeng Zhou
Arxiv
Success is in the Details: Evaluate and Enhance Details Sensitivity of Code LLMs through Counterfactuals
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Mingzheng Xu, Tianhao Cheng, Yixuan Wang, Zheng Chu, Shijie Xuyang, Zhiyuan Ma, YuanTao Fan, Wanxiang Che
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Mingzheng Xu, Tianhao Cheng, Yixuan Wang, Zheng Chu, Shijie Xuyang, Zhiyuan Ma, YuanTao Fan, Wanxiang Che
Arxiv
Semi-Instruct: Bridging Natural-Instruct and Self-Instruct for Code Large Language Models
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Xu Wang, Qing Yang, Dongliang Xu, Wanxiang Che
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Xu Wang, Qing Yang, Dongliang Xu, Wanxiang Che
OTHER PUBLICATIONS
ACL 2026 (Findings)
Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format
Dingzirui Wang, Xuanliang Zhang, Rongyu Cao, Longxu Dou, Xianzhen Luo, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li
Dingzirui Wang, Xuanliang Zhang, Rongyu Cao, Longxu Dou, Xianzhen Luo, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li
Survey
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Core Contributor
Core Contributor
Tech Report
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
Core Contributor
Core Contributor
ACL 2025 Oral
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang, Tianhao Cheng, ..., Xianzhen Luo, ..., Zili Wang
Siming Huang, Tianhao Cheng, ..., Xianzhen Luo, ..., Zili Wang
KDD 2025
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning
Zhiyuan Ma, Jiayu Liu, Xianzhen Luo, Zhenya Huang, Qingfu Zhu, Wanxiang Che
Zhiyuan Ma, Jiayu Liu, Xianzhen Luo, Zhenya Huang, Qingfu Zhu, Wanxiang Che
ACL 2025 (Findings)
ChartEdit: How Far Are MLLMs From Automating Chart Analysis?
Xuanle Zhao*, Xuexin Liu*, Yang Haoyue*, Xianzhen Luo, Fanhu Zeng, Jianling Li, Qi Shi, Chi Chen
Xuanle Zhao*, Xuexin Liu*, Yang Haoyue*, Xianzhen Luo, Fanhu Zeng, Jianling Li, Qi Shi, Chi Chen
COLING 2024
A Survey on Natural Language Processing for Programming
Qingfu Zhu, Xianzhen Luo, Fang Liu, Cuiyun Gao, Wanxiang Che
Qingfu Zhu, Xianzhen Luo, Fang Liu, Cuiyun Gao, Wanxiang Che
ACL 2022 (Findings)
Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging
Yutai Hou, Cheng Chen, Xianzhen Luo, Bohan Li, Wanxiang Che
Yutai Hou, Cheng Chen, Xianzhen Luo, Bohan Li, Wanxiang Che
AI Open 2022
Augmented and Challenging Datasets with Multi-step Reasoning and Multi-span Questions for Chinese Judicial Reading Comprehension
Qingye Meng, Ziyue Wang, Hang Chen, Xianzhen Luo, Baoxin Wang, Zhipeng Chen, Yiming Cui, Dayong Wu, Zhigang Chen, Shijin Wang
Qingye Meng, Ziyue Wang, Hang Chen, Xianzhen Luo, Baoxin Wang, Zhipeng Chen, Yiming Cui, Dayong Wu, Zhigang Chen, Shijin Wang
Arxiv
Automated Snippet-Alignment Data Augmentation for Code Translation
Zhiming Zhang, Qingfu Zhu, Xianzhen Luo, Yixuan Wang, Bohan Li, Wanxiang Che
Zhiming Zhang, Qingfu Zhu, Xianzhen Luo, Yixuan Wang, Bohan Li, Wanxiang Che
PROJECTS
Huozi: An Open-Source Universal LLM | 12 200
Mar 2023 – May 2023
- Served as a core developer, responsible for the collection and curation of data for code pretraining and post-training.
- Led the collection and organization of Chinese pretraining datasets.
Abacus: A Lightweight Code LLM | 2 47
Oct 2023 – Sep 2024
- Abacus with 2.7B parameters, outperforms other Code LLMs with parameters ≤3B such as Stable Code-3B and Granite-3B-Code on both coding and general language tasks.
- Led post-training data construction.