Over the last few decades, software projects have become larger, more useful, and have increasingly penetrated various aspects of human life. During the development and maintenance of software, developers spend a significant portion of their time on locating and correcting code errors. Research shows that developers dedicate approximately 50% of their time to identifying and resolving defects. This reality underscores the need for automated methods of defect detection and correction, which could improve both the efficiency and quality of software.
In this context, modern tools have been developed for detecting defects in code, which are widely used by programmers. However, most systems do not provide the capability for automatic defect correction, and therefore, programmers must dedicate time to solve them. Furthermore, existing tools are useful for detecting syntactic and structural issues, but often are unable to identify logical errors or problems that occur only during program execution.
This thesis aims to create a system for detecting and correcting code errors. To achieve this, an extensive dataset of code commits is used, containing files before and after defect corrections. The versions of these files are collected through the GitHub API, from which the independent code segments (functions) that contain the errors, as well as their corresponding corrections, are extracted. Finally, this specific dataset is utilized to train Large Language Models for the detection and correction of code errors, with the goal of enhancing the software development process. The results of the experiments show that our system can be useful for defect classification and correction.