
Cong Xie
I am a Research Scientist at ByteDance, focusing on optimizing systems for large-scale machine learning. My work centers on accelerating the training of foundational large language models through innovative system-algorithm co-design. I strive to deliver solutions that not only enhance practical performance but also uphold strong theoretical guarantees.
I obtained my Ph.D. from University of Illinois at Urbana Champaign, co-advised by Prof. Indranil Gupta and Prof. Oluwasanmi Koyejo.
CV (updated in Nov 2024) | |
Google Scholar | |
Contact Info | Email: xcgoner1108 AT gmail.com |
Research Interest
- Large-scale and Distributed Machine Learning
- System-algorithm Co-design for Machine Learning
- Efficient Machine Learning
- Non-convex Optimization
Publication
-
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.
Jinda Jia, Cong Xie, Hanlin Lu, Daoce Wang, Hao Feng, Chengming Zhang, Baixi Sun, Haibin Lin, Zhi Zhang, Xin Liu, Dingwen Tao.
Advances in Neural Information Processing Systems (NeurIPS) 2024.
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs.
Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu.
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI) 2024.
-
SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training.
Yangrui Chen, Cong Xie, Meng Ma, Juncheng Gu, Yanghua Peng, Haibin Lin, Chuan Wu, and Yibo Zhu.
Advances in Neural Information Processing Systems (NeurIPS) 2022.
-
ZenoPS: A Distributed Learning System Integrating Communication Efficiency and Security.
Cong Xie, Sanmi Koyejo, Indranil Gupta.
MDPI Algorithms 15.7 (2022)
-
CSER: Communication-efficient SGD with Error Reset.
Cong Xie , Shuai Zheng, Sanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin.
Advances in Neural Information Processing Systems (NeurIPS) 2020.
-
Zeno++: Robust Fully Asynchronous SGD.
Cong Xie , Sanmi Koyejo, Indranil Gupta.
International Conference on Machine Learning (ICML) 2020.
-
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates.
Cong Xie , Sanmi Koyejo, Indranil Gupta, Haibin Lin.
NeurIPS workshop on Optimization for Machine Learning (OPT) 2020.
https://arxiv.org/abs/1911.09030
-
Asynchronous Federated Optimization.
Cong Xie , Sanmi Koyejo, Indranil Gupta.
NeurIPS workshop on Optimization for Machine Learning (OPT) 2020.
https://arxiv.org/abs/1903.03934
-
Baechi: Fast Device Placement of Machine Learning Graphs.
Beomyeol Jeon, Linda Cai, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta.
Proc. ACM Symposium on Cloud Computing (ACM SoCC), 2020.
https://arxiv.org/abs/1903.03934
-
SLSGD: Secure and Efficient Distributed On-device Machine Learning.
Cong Xie , Sanmi Koyejo, Indranil Gupta.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2019.
-
Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation.
Cong Xie , Sanmi Koyejo, Indranil Gupta.
Uncertainty in Artificial Intelligence (UAI) 2019.
-
Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance.
Cong Xie , Sanmi Koyejo, Indranil Gupta.
International Conference on Machine Learning (ICML) 2019.
-
Distributed Power-Law Graph Computing: Theoretical and Empirical Analysis.
Cong Xie , Ling Yan, Wu-Jun Li, and Zhihua Zhang.
In Proceedings of Conference on Neural Information Processing Systems (NeurIPS) 2014.
PDF, Code
-
A Scalable and Extensible Framework for Superposition-Structured Models.
Shenjian Zhao, Cong Xie , and Zhihua Zhang.
The Thirtieth Conference on Artificial Intelligence (AAAI-16) 2015.
-
Wishart Mechanism for Differentially Private Principle Components Analysis.
Wuxuan Jiang, Cong Xie , and Zhihua Zhang.
The Thirtieth Conference on Artificial Intelligence (AAAI-16) 2015.
http://arxiv.org/abs/1511.05680
-
Feature Extraction and Ensemble Decision Tree Classifier in Plant Failure Detection.
Cong Xie , Donglin Yang, Yixiang Huang, and Donglai Sun.
Annual Conference of the Prognostics and Health Management Society (IEEE PHM2015 Data Challenge Winner Paper) 2015.
PDF, Code
Preprints
-
Compressed Communication for Distributed Training: Adaptive Methods and System.
Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin.
https://arxiv.org/abs/2105.07829
-
Phocas: dimensional Byzantine-resilient stochastic gradient descent.
Cong Xie , Sanmi Koyejo, and Indranil Gupta.
https://arxiv.org/abs/1805.09682
-
Distributed Power-Law Graph Computing: Theoretical and Empirical Analysis.
Cong Xie , Ling Yan, Xiao-Fan Niu, Wuxuan Jiang, Wu-Jun Li, and Zhihua Zhang.
Long Version, Code
-
S-PowerGraph: Streaming Graph Partitioning for Natural Graphs by Vertex-Cut.
Cong Xie , Wu-Jun Li, and Zhihua Zhang.
http://arxiv.org/abs/1511.02586
-
A New Relaxation Approach to Normalized Hypergraph Cut.
Cong Xie , Wu-Jun Li, and Zhihua Zhang.
http://arxiv.org/abs/1511.02595
Academic Service
Journal Reviewer
- Journal of Machine Learning Research (JMLR)
- ACM Transactions on Autonomous and Adaptive Systems (TAAS)
- IEEE Transactions on Signal Processing
- IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
- Journal of Computer Science and Technology (JCST)
- Transactions on Signal and Information Processing over Networks (SIPN)
- Transactions on Network Science and Engineering (TNSE)
- Frontiers in Artificial Intelligence
- Journal of Systems Architecture
- Transactions on Knowledge Discovery from Data (TKDD)
- Distributed Computing
- IEEE Transactions on Computers (TC)
- IEEE Access
- TMLR
Conference Reviewer
- MLSys, ICML, NeurIPS, AISTATS, ICLR, UAI, DISC, AAAI, IJCAI
Honors & Awards
- J.P. Morgan 2020 AI Research PhD Fellowship Awards
- National Scholarship (Top 2%)
- 3rd place in IEEE PHM 2015 Data Challenge (Leaderboard)
- SJTU Academic Excellence Scholarship Class-B (Top 10%)
- SJTU Academic Excellence Scholarship Class-C (2 Times, Top 20%)
- 2nd provincial-level in China Undergraduate Mathematical Contest in Modeling