About me
I am a joint PhD student, supervised by Dongmei Zhang, Shi Han, Yanlin Wang and Prof. Hongbin Sun, between Microsoft Research Asia and Xi’an Jiaotong University.
I am mainly focusing on code intelligence (intersection of Software Engineering and Artificial Intelligence), which leverages artificial intelligence approaches to analyze and model source code and its related artifacts. Specifically, it utilizes the machine learning model to mine knowledge from large scale, free source code data (Big Code) which is available on Github, etc, obtains the better code representation (based on code token/ AST/ PDG/ IR etc) and applies the representation to the downstream tasks such as code summarization, code search, clone detection, code completion, program repair, etc.
My research areas currently include: (1) Code Represention Learning; (2) Code Summarization; (3) Code Search; (4)Commit Message Generation
Publications
2023
CoCoAST: Representing Source Code via Hierarchical Splitting and Reconstruction of Abstract Syntax Trees
EMSE (CCF-B) [pdf] [code]
Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
TLNR: We propose a novel model CoCoAST that hierarchically splits and reconstructs ASTs to comprehensively capture the syntactic and semantic information of code without the loss of AST structural information. We have applied our source code representation to two common program comprehension tasks, code summarization and code search.
You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search
ICSME2023 (CCF-B) [pdf] [code]
Yanlin Wang, Lianghong Guo, Ensheng Shi*, Wenqing Chen, Jiachi Chen, Wanjun Zhong, Menghan Wang, Hui Li, Ziyu Lyu, Hongyu Zhang, Zibin Zheng
TLNR: We propose a novel approach ChatDANCE, which utilizes high-quality and diverse augmented data generated by a large language model and leverages a filtering mechanism to eliminate low-quality augmentations.
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
ISSTA2023 (CCF-A) [pdf] [code]
Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun
TLNR: we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings.
CoCoSoDa: Effective Contrastive Learning for Code Search
ICSE2023 (CCF-A)[pdf] [code]
Ensheng Shi, Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
TLNR: We propose CoCoSoDa to effectively utilize contrastive learning for code search via two key factors in contrastive learning: data augmentation and negative samples.
2022
RACE: Retrieval-augmented Commit Message Generation
EMNLP2022 (CCF-B)[pdf] [code]
Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
TLNR: We propose a new retrieval-augmented neural commit message generation method, which treats the retrieved similar commit as an exemplar and leverages it to generate an accurate commit message.
A Large-Scale Empirical Study of Commit Message Generation: Models, Datasets and Evaluation
EMSE2022 (CCF-B) [pdf] [code]
Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang
TLNR: To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets.
On the Evaluation of Neural Code Summarization
ICSE2022 (CCF-A) [pdf] [code][blog]
Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun
TLNR: Some interesting and surprising findings on the evaluated metric, code-preprocessing, and evaluated datasets. Building a shared code summarization toolbox andgiving actionable suggestions on the evaluation of neural code summarization.
2021
CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees
EMNLP2021 (CCF-B) [pdf] [code]
Ensheng Shi, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
TLNR: Our model hierarchically splits and reconstructs ASTs to obtain the better code representation for code summarization.
Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search
CIKM 2021 (CCF-B) [pdf] [code]
Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang
TLNR: Ensembling three models which focus on the structure of code , local variables, and the information of API invocation, separately, for semantic code search.
On the Evaluation of Commit Message Generation Models: An Experimental Study
ICSME 2021 (CCF-B) [pdf] [code]
Wei tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Dongmei Zhang, Wenqiang Zhang
TLNR: We conduct the empirical study on evaluated metrics and existing datasets. We also collect a large-scale, information-rich, and multi-language commit message dataset MCMD and evaluate existing models on this dataset.
2020
CoCoGUM: Contextual Code Summarization with Multi-Relational GNN on UMLs
MSR-TR 2020 [pdf]
Yanlin Wang, Lun Du, Ensheng Shi, Yuxuan Hu, Shi Han, Dongmei Zhang
TLNR: We explore modeling two global contexts: intra-class level context and inter-class level context for code summarization.
Educations
2019.8 ~ Present: Xi'an Jiaotong University
MSRA-XJTU Joint PHD
The College of Artificial Intelligence
2015.8 ~ 2019.7: Xi'an Jiaotong University
Outstanding Graduate
Automatic Science and Technology
Experiences
- Research Intern in Microsoft Research Asia
Advised by Dongmei Zhang, Shi Han, & Yanlin Wang in Data, Knowledge, and Intelligence group, from Jun 2020 to Present. - Research Intern in Microsoft Research Asia
Advised by Dongmei Zhang, Shi Han, Zhouyu Fu, & Mengyu Zhou in Software Analytics group, from Nov 2018 to Aug 2019.
Awards
- 2023 Outstanding Doctoral Graduate Student (The Highest Honor Awarded to Doctoral Students by Xi'an Jiaotong University)
- 2023&2022&2021 Excellent Graduate Student
- 2023&2018. National Scholarship
- 2019 The Third Prize of Asia and Pacific Mathematical Contest
- 2019 Outstanding Graduate Award
- 2019 The Honor of " Star of Tomorrow" by MSRA
- 2018 Elite Class Scholarship of Institute of Automation, China Academy of Sciences
- 2018 Grateful Scientist Bursary
- 2017&2016 National Encouragement Scholarship
- 2016 First Prize of Mathematical Modeling Contest at Provincial Level
- 2017&2016 Award for Perseverance and Diligence
- 2015 The Third Prize of Mathematical Modeling Contest at University Level