This paper presents a technique for automatically extracting code segments from research articles and associating them with descriptions in the surrounding natural language text. The technique uses heuristics to identify "seed" sentences that likely contain or reference code, and then examines neighboring text to find descriptions. An evaluation on 100 code segments from papers showed a precision of 68% and recall of 21% for identifying code descriptions. The authors aim to improve accuracy and scale their approach to benefit tasks like code recommendation and documentation generation.
Related topics: