This study analyzed 680,209 builds from 1,276 open source projects on Travis CI to evaluate noise and heterogeneity in historical build data. The study found that 12% of passing builds ignored failures, breaks can persist for over 400 days, and 67% of breaks are stale. Additionally, 9-14% of builds are incorrectly labeled due to noise. Build outcomes also exhibited heterogeneity, with 41% of breaks occurring outside of the build tool (Maven) and environment-specific breaks being common. The implications are that researchers should filter noise from analyses and consider heterogeneity, while tool builders should look beyond tools to recover from breaks.
Related topics: