Microsoft Research Lab – Asia

TimeCraft: A universal framework for time-series generation

Published August 4, 2025

Share this page

Time-series data—measurements collected at regular intervals, like stock prices or traffic flows—has become a key driver of intelligent decision-making systems across industries. From medical monitoring to financial risk control, identifying patterns in this data is essential to many important operations.

At the same time, the creation of time-series data, or data synthesis, is gaining momentum as organizations grapple with scarcity of real-world, privacy protection, and the need to test a variety of different scenarios without exposing themselves to risk. AI-generated synthetic data simulates realistic patterns in a risk-free environment. It enables researchers to explore hypothetical scenarios and train models to make decisions in high-stakes contexts.

Yet many of these models fall short of what’s needed. To be truly practical, a generator of time-series data must adapt across different industries and data patterns, offer precise control over trends and volatility and produce data that is realistic and reliable enough to support accurate modeling and analysis.

Microsoft Research Asia developed TimeCraft (opens in new tab) to address this need. This open-source framework creates synthetic time-series data that can be used across different industries and scaled up for commercial applications. Users control data generation through simple written commands, and the system can adapt to different business needs, whether companies want to analyze existing patterns or create data for specific goals.

Three ways to guide generation

TimeCraft’s user interface is build for flexibility. Users can guide date generation through three distinct methods:

Few-shot adaptation: Users can to upload a small set of unlabeled samples from the target domain. TimeCraft learns structural features from these samples and generates high-quality data, no retraining or labels required.
Natural language control: Users can describe their desired time series in plain language, such as “stable early on, followed by sharp fluctuations.” TimeCraft interprets the prompt and produces data accordingly.
Task model feedback: Users can integrate their models—like a disease predictor or market trend detector—into the data creation process. TimeCraft dynamically adjusts the output based on model feedback, optimizing the data for performance.

These methods can be used independently or together, allowing users to generate data that aligns with specific goals, scenarios, or operational needs.

diagram — Figure 1. Overview of the TimeCraft architecture

One model, many industries

TimeCraft works across multiples industries—where each type of time-series data follows distinct patterns—with a unified approach built around semantic prototypes. These are shared representations of time-series structures serve as a universal vocabulary.

When users provide a few example time-series sequences from their specific industry, the Prototype Assignment Module (PAM) maps them to the prototype space, calculating optimal combinations. This industry-specific input guides the model to generate structurally aligned data, no labels or retraining needed.

The result is a system that can rapidly adapt to new scenarios in fields such as energy, healthcare, finance, and transportation, demonstrating strong structural transfer and generalization.

Text-controlled generation: One sentence guides the model

In many real-world scenarios, users know what kind of data they need but don’t have access to enough relevant examples. A typical request might be: “I want a time series that slowly rises for a few days, drops around day 10, and then fluctuates.” These types of needs often arise in fields like healthcare and finance, where designing and testing systems with realistic data is essential but data access is limited.

TimeCraft makes it possible to generate this kind of tailored data using plain language. Instead of relying on specialized tools or existing datasets, users can simply describe the pattern they’re looking for, and the system creates data that fits.

It does this using a collaborative training process involving multiple AI agents. It collects phrasing from real-world industry reports, fills in details using actual data statistics, and refines the wording until the descriptions match the data both clearly and accurately.

When a user submits a description, TimeCraft translates it into guidance for its generative model, enabling direct input, even from users without technical expertise. This makes the tool especially useful in situations where data is scarce or constantly changing. By bridging the user’s intent with the model’s capabilities, TimeCraft makes custom data generation as simple as writing a sentence. This process is illustrated in Figure 2.

diagram, schematic — Figure 2. TimeCraft text-to-time series module consists of: (top) a multi-agent system that creates pairs of plain-language descriptions and matching time-series data, and (bottom) a hybrid mechanism that turns user-written descriptions into synthetic time-series data.

Task-aware generation: Optimized for real-world impact

Most generation models focus on producing realistic data. TimeCraft goes a step further, generating data that improves performance of downstream applications—whether it’s detecting disease trends or modeling market behavior.

This is possible thanks to TimeCraft’s task-aware generation framework. Users can integrate their existing models directly into the data-creation process. The system then uses feedback from these models to guide the direction of data generation in real time, so the output isn’t just realistic, it’s useful.

At the core of this method is a technique called influence scoring, which estimates how each piece of generated data affects a model’s performance. TimeCraft uses these scores to guide the generation process, helping the system produce data with the greatest potential to improve results. This process is shown in Figure 3.

map — Figure 3. Influence scoring process within the TimeCraft framework

This approach is especially helpful in cases where certain patterns are rare or critically important. For instance, in medical diagnosis, TimeCraft can focus on generating a small set of patterns that meaningfully improve prediction accuracy.

By shifting the goal from simulating data to generating data that actively improves outcomes, TimeCraft turns synthetic data into a strategic tool.

Built for real-world use, now open source

TimeCraft was built for real-world applications. It accepts different types of input, adapts to complex use cases, and improves over time using feedback from the tasks it supports. Researchers at Microsoft Research Asia envision it as a comprehensive solution for industries where data is limited, expensive to collect, or sensitive to share—making data generation more targeted, useful, and scalable.

Now open source (opens in new tab), TimeCraft is available for developers, researchers, and business partners around the world to explore, test, and build on.

Related research:

Cross-domain generalization

Controllability