Lundi, le Quatorze Juillet: Devs using AI code more slowly; a framework for inference using unstructured data; some cool French startups
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity ... it's not good
• Counterintuitive Core Finding: In a randomized controlled trial with 16 experienced open-source developers, AI tools made developers 19% slower rather than faster, directly contradicting both developer expectations (24% speedup predicted) and post-hoc perceptions (20% speedup believed after experiencing slowdown).
• Robust Experimental Design: The study recruited developers from large, high-quality repositories (averaging 22k+ stars, 1M+ lines of code) working on real issues they identified as valuable. Developers were randomly assigned to use or avoid AI tools (primarily Cursor Pro with Claude 3.5/3.7 Sonnet) while completing typical 2-hour coding tasks, with screen recording and self-reported implementation times.
• Perceptual Disconnect: Perhaps most striking is the persistent gap between reality and perception. Developers maintained their belief that AI helped them even after experiencing measurable slowdown. This suggests that subjective assessments of AI productivity may be systematically biased, calling into question anecdotal reports of AI effectiveness.
• Factor Analysis: The researchers investigated 20 potential explanations for the slowdown, identifying 5 likely contributors while ruling out experimental artifacts. Developers used frontier models correctly, didn't differentially abandon difficult tasks, and produced similar-quality code regardless of AI usage.
• Reconciling Contradictory Evidence: The paper thoughtfully addresses the tension between their findings and impressive AI benchmark scores plus widespread anecdotal reports of AI helpfulness. They propose three hypotheses: RCT underestimates capabilities, benchmarks/anecdotes overestimate capabilities, or different methodologies capture different aspects of AI performance.
• Benchmark vs. Reality Gap: The study highlights crucial differences between controlled benchmarks (algorithmic scoring, well-scoped tasks) and real-world development (human satisfaction, implicit requirements for documentation/testing, quality standards). This suggests benchmarks may overestimate practical AI capabilities.
• Task Complexity Matters: Real development work involves numerous implicit requirements such as code style, documentation, testing coverage and integration considerations that may not be captured in simplified benchmark tasks but significantly impact actual productivity.
• Methodological Innovation: The research demonstrates a promising approach for measuring AI's real-world impact that may be harder to game than traditional benchmarks, providing complementary evidence for understanding AI capabilities and their implications for AI R&D acceleration.
• Future Implications: The methodology offers a framework for tracking AI progress over time in realistic deployment scenarios, which is particularly important for understanding AI's potential to accelerate its own development: a key consideration for AI safety and governance.
It's not too long, and it's worth reading
Some empirical grounding for debates about AI productivity impacts! Rigorous real-world evaluation yields surprising results that challenge both technical benchmarks and human intuition. Who'd have thought it?
Made me think
If developers misperceive AI's impact on their own productivity, what about the reliability of self-reported assessments in other domains where AI is being deployed?
The study suggests AI may perform worse in high-quality development environments with extensive implicit requirements. How should this influence decisions about where and how to deploy AI tools across different software development contexts?
A Unifying Framework for Robust and Efficient Inference with Unstructured Data
Economists in particular increasingly rely on unstructured data (text from news articles, government transcripts, and social media; satellite imagery; audio recordings) to measure concepts like policy uncertainty, institutional quality, economic activity, and social phenomena. This shift has been accelerated by deep learning's ability to extract meaningful patterns from high-dimensional data at unprecedented scale, opening new research frontiers and enabling measurement of previously unquantifiable concepts.
This is applicable in many areas where unstructured data needs to be analyzed and inferences made.
• Core Innovation: The paper introduces MAR-S (Missing At Random Structured Data), a framework that reframes the use of unstructured data as a missing data problem. Rather than using neural network predictions directly, researchers treat the low-dimensional features they want (e.g., sentiment, topics) as missing structured data that can be imputed from high-dimensional unstructured inputs.
• The Bias Problem: Neural networks and other machine learning models don't generically produce unbiased predictions, even with large datasets. This measurement error propagates to downstream econometric analyses, affecting both point estimates and uncertainty quantification. The availability of different off-the-shelf models with different biases also raises concerns about selective reporting.
• Technical Solution: MAR-S leverages classic results from semiparametric inference by requiring a validation sample with ground truth labels. Using Rubin's "missing at random" assumption, the framework constructs doubly robust estimators that can tolerate imperfect imputation functions while maintaining valid statistical inference.
• Efficiency Insights: The paper shows that optimal imputation functions should depend not only on unstructured data but also on context-specific variables relevant to the target parameter. This connects to familiar econometric concepts like regression adjustment and reveals when researchers should invest in improving their machine learning models.
• Practical Extensions: The framework addresses common empirical scenarios overlooked in existing literature, particularly when validation data exists at granular levels (individual texts/images) but parameters of interest involve aggregated data. It also handles nonlinear transformations of aggregated structured data.
• Methodological Applications: The authors develop efficient estimators for descriptive moments, linear regression, instrumental variables, difference-in-differences, and regression discontinuity designs, showing how MAR-S unifies recent work on inference with black-box AI models and connects to established econometric methods.
Recommended by LinkedIn
• Empirical Validation: Three applications demonstrate the framework's utility: re-analyzing the Economic Policy Uncertainty Index (Baker et al. 2016) and Geopolitical Risk Index (Caldara and Iacoviello 2022), plus a new analysis of political content in historical newspapers. Results show that ignoring measurement error leads to overly precise confidence intervals and potentially biased estimates.
• Assumption Requirements: The framework requires researchers to define precise, implementable definitions of what they're extracting, access to validation data meeting the "missing at random" assumption, and known annotation probabilities. While these requirements limit applicability, they enable principled uncertainty quantification.
It's not too long, and it's worth reading
This paper provides essential methodology for the growing use of unstructured data in economics, offering both theoretical rigor and practical tools. It addresses a critical gap between the promise of machine learning and the requirements of valid statistical inference, making AI-generated measures suitable for serious empirical work.
Made me think
How would researchers balance the cost of collecting validation data against the precision gains from more sophisticated neural networks? For example, when dealing with rare events or highly imbalanced datasets?
Also, the framework requires "precise and implementable definitions" of the structured data being extracted. But how does this requirement work with the inherently subjective nature of many social science concepts? When might the demand for precision conflict with theoretical richness?
Still, super interesting work.
For Bastille Day, a few interesting French startups which I follow ...
Syroco is a climate technology company that provides AI-driven voyage optimization solutions for the maritime industry, helping ship operators and charterers reduce fuel consumption, operational costs, and carbon emissions. Their platform, Syroco Live, uses advanced digital twin technology and real-time data (including weather and ocean conditions) to compute the most efficient routes and speed profiles for vessels, delivering actionable recommendations directly to crews through a user-friendly interface. This approach enables energy savings of 10% or more per voyage and has already helped ships avoid over 230,000 tonnes of CO₂ emissions since January 2024, supporting the industry’s transition toward more sustainable and competitive operations
allmates.ai is a company that provides organizations with AI-powered coworkers (specialized digital assistants called "Mates") to enhance teamwork, productivity, and innovation. Their platform offers over 350 customizable AI agents that can automate routine tasks, integrate seamlessly with popular workplace tools, and support a wide range of business functions such as marketing, data analysis, and customer relationship management. With a focus on security, easy integration, and real-time collaboration, allmates.ai helps businesses streamline operations and empower employees to focus on more strategic and creative work.
Pigment is an AI-powered, integrated business planning platform designed to unify data, people, and processes across organizations in real time. It enables teams from finance, sales, HR, supply chain, and other departments to collaboratively build dynamic, adaptable business plans, run forecasts and scenarios, and make informed decisions with a single source of truth. Pigment’s platform consolidates data from multiple sources via over 30 native connectors and APIs, ensuring all users work with accurate, up-to-date information. Its intuitive interface supports collaboration with features like comments and visualizations tailored to different teams, while its AI capabilities accelerate data interaction and model building. Founded in 2019 and headquartered in Paris, Pigment serves global enterprise clients such as Unilever and Coca-Cola, and is recognized as a visionary in financial planning software. The company emphasizes agility, scalability, and enabling businesses to adapt quickly to change and uncertainty
JustFind provides a solution for keeping CRM (Customer Relationship Management) contact data up to date by automatically tracking career changes of contacts via LinkedIn. The platform identifies when former clients or key contacts change jobs and updates this information in the CRM, allowing businesses to reconnect with these individuals at the right moment and turn them into high-potential prospects. JustFind offers real-time synchronization between LinkedIn and CRM systems, automates the detection of obsolete contacts, and adds missing contacts, thereby reducing administrative workload and boosting sales and marketing effectiveness. The service is particularly aimed at SMEs and startups, helping them maximize their commercial pipeline by ensuring their CRM always reflects the latest professional movements of their network
Toucan (also known as Toucan Toco) is a no-code embedded analytics and data storytelling platform that enables companies to integrate interactive, customizable dashboards and reports directly into their products or internal tools with minimal engineering effort. It connects to a wide range of data sources and warehouses, allowing users to prepare data without coding, build visualizations with a drag-and-drop interface, and create compelling data stories enriched with contextual information. Toucan’s platform emphasizes ease of use, fast deployment (often under two weeks), and security with granular access controls, making data insights accessible and actionable for non-technical audiences. It serves a diverse client base including large enterprises and fast-growing SaaS companies, helping them improve data-driven decision-making and user engagement by embedding real-time analytics seamlessly into their workflows and applications