DeepSim: Deep Learning Code Functional Similarity - Texas A&M University (TAMU) Scholar

abstract

2018 Association for Computing Machinery. Measuring code similarity is fundamental for many software engineering tasks, e.g., code search, refactoring and reuse. However, most existing techniques focus on code syntactical similarity only, while measuring code functional similarity remains a challenging problem. In this paper, we propose a novel approach that encodes code control flow and data flow into a semantic matrix in which each element is a high dimensional sparse binary feature vector, and we design a new deep learning model that measures code functional similarity based on this representation. By concatenating hidden representations learned from a code pair, this new model transforms the problem of detecting functionally similar code to binary classification, which can effectively learn patterns between functionally similar code with very different syntactics. We have implemented our approach, DeepSim, for Java programs and evaluated its recall, precision and time performance on two large datasets of functionally similar code. The experimental results show that DeepSim significantly outperforms existing state-of-theart techniques, such as DECKARD, RtvNN, CDLH, and two baseline deep neural networks models.

name of conference

Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

authors

Huang, Jeff

published proceedings

ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING

author list (cited authors)

Zhao, G., & Huang, J.

citation count

116

complete list of authors

Zhao, Gang||Huang, Jeff

publication date

October 2018

publisher

Association for Computing Machinery (ACM) Publisher

keywords

Classification
Code Functional Similarity
Control/data Flow
Deep Learning

Digital Object Identifier (DOI)

10.1145/3236024.3236068

International Standard Book Number (ISBN) 13

9781450355735

start page

141

end page

151

URL

http://dx.doi.org/10.1145/3236024.3236068

DeepSim: Deep Learning Code Functional Similarity Conference Paper

Overview

abstract

name of conference

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

Additional Document Info

start page

end page

Other

URL