This document discusses unsupervised machine learning techniques for clone detection in source code. It begins by defining different types of code clones and describing current state-of-the-art clone detection tools. It then argues that machine learning approaches, such as using kernel methods to compare abstract syntax trees, can provide more computationally efficient and accurate clone detection compared to traditional text-, token-, and syntax-based techniques. The document provides examples of using kernel functions to compute similarities between code structure representations like ASTs to enable unsupervised machine learning for clone detection.
Related topics: