Three Ways to Tame Your Data: Typing, Dataclasses, and Pydantic

Three Ways to Tame Your Data: Typing, Dataclasses, and Pydantic

Python's dynamic typing offers tremendous flexibility and enables rapid prototyping, but this freedom comes at a cost: unexpected runtime failures that can be catastrophic in production environments. For developers transitioning from statically typed languages like C and C++, this shift can feel unsettling. Fortunately, Python's ecosystem provides several tools that bridge this gap, offering different approaches to add structure and safety to your code without sacrificing Python's inherent flexibility. 

This need for structure becomes even more critical in today's AI-driven development landscape. When building agentic frameworks, we constantly handle unpredictable external API responses, diverse user inputs, complex configuration data, and structured inter-agent communication. In frameworks like LangGraph, where state schemas and data flow are crucial, Python's dynamic typing can lead to silent failures that surface only at runtime—often at the worst possible moment. This is precisely why modern AI frameworks have gravitated toward tools like Pydantic, which adds robust runtime validation that Python's dynamic typing doesn't provide by default. 

In this article, we'll explore three complementary approaches to taming Python's dynamic nature:

  • Type Hints - Static analysis capabilities (via tools like MyPy) without altering runtime behaviour

  • Dataclasses - Structured data containers that preserve Python's dynamic flexibility

  • Pydantic - Runtime validation and serialisation that catches errors before they become problems

Type annotations using ‘typing’ Module

Type annotations enable static analysis tools like mypy to catch potential bugs before runtime, while enhancing IDE support with better autocomplete, error highlighting, and refactoring assistance. They serve as executable documentation that stays in sync with your code. When paired with tools like mypy, they work like a traffic light system—highlighting potential issues and giving you warnings, but Python will still run the code regardless. Use this approach when you want to catch bugs early in development without runtime validation overhead.

Mypy requires Python 3.9 or later to run. Install mypy using pip (python3 -m pip install mypy) and run it by using mypy command (mypy typing_eg.py). The typing module offers extensive functionality, but for this refresher, I'll focus on a few commonly used ones.

  • Simple example of function annotation

Any type

Any accepts any type of value and disables type checking for that specific element—it's equivalent to having no type hint at all. While this provides maximum flexibility, it defeats the purpose of type hints by removing all type safety.

Example -

Avoid Any when possible. Use more specific types or Union instead. Use it as temporary solution while migrating legacy code when dealing with truly unknown data structures.

Union type

The Union type allows a variable or parameter to accept multiple specific types, giving you controlled flexibility while maintaining type safety.

Example -

Optional type

The Optional type indicates that a value can be either a specific type or None. It makes explicit that None is an acceptable value. It's a shorthand for Union[T, None].

Note that Optional is different from default values. Optional is about TYPE, and defaults are about VALUES. Optional tells the type checker what types are acceptable, while default values determine whether an argument must be provided when calling the function. Combining both is a best practice as in the below example.

Literal type

Literal can be used to indicate to type checkers that the annotated object has a value equivalent to one of the provided literals. It is great for configuration options, status values, or enums.

Example -

Annotated type

Annotated allows you to attach metadata to types while keeping the core type information intact. The syntax is Annotated[T, *metadata], where T is your actual type and everything after are metadata annotations that travel alongside it.

The key insight is that Annotated creates a division of labor: static type checkers like mypy only care about the first argument (the actual type), while runtime tools can access the metadata to implement additional behavior. Python itself ignores this metadata unless you explicitly write code to inspect it.

This design enables powerful integrations with libraries. For example, Pydantic reads Field constraints from the metadata to enforce validation rules, while FastAPI uses the same metadata to generate API documentation.

Examples -

Annotated[int, Field(ge=0, le=100)] for age validation

Annotated[str, Field(min_length=3)] for username requirements.

Common use cases include adding documentation that stays synchronized with your types (Annotated[float, "Temperature in Celsius"]), specifying units (Annotated[float, "meters"]), defining constraints (Annotated[int, range(1, 100)]), and creating domain-specific validation rules. Use Annotated when basic type hints aren't expressive enough for your needs—particularly when building APIs, data validation layers, or domain-specific type systems where you need to communicate more than just the type structure.

Real-world example: LangGraph's state reducers demonstrate Annotated's power perfectly. Using Annotated[list[str], add], the add function from the operator module serves as metadata that tells LangGraph how to merge state updates—by concatenating lists rather than overwriting them.

Example -

TypedDict type

TypedDict is a special construct from Python's typing module that allows you to define the structure of dictionaries with specific key-value type annotations. It lets you specify exactly what keys a dictionary should have and what types their values should be. It was introduced in Python 3.8 and is also available via typing_extensions for earlier versions. Dict from typing is different as is a generic type that describes dictionaries where all keys have the same type and all values have the same type. Here are the main differences between Dict and TypedDict from typing module.

  1. Structure: Dict[K, V] is homogeneous (all keys same type, all values same type), while TypedDict is heterogeneous (specific keys with specific types)

  2. Key types: Dict can have keys of any hashable type, TypedDict always has string keys

  3. Flexibility: Dict can have any number of keys, TypedDict has a fixed structure

  4. Use cases: Use Dict[K, V] for mappings where you don't know the specific keys ahead of time. Use TypedDict when you know exactly what keys should exist (like API responses, configuration objects, or structured data)

Also, explore the use of State in LangGraph, which provides an excellent example of how TypedDict can be utilized.

Example -

Here's a quick summary of what we have covered so far with type annotations from the 'typing' module:

summary

The typing module offers many additional features; refer to the official documentation for a comprehensive list.

While type hints bring structure to Python and empower static analysis, they don’t enforce behavior at runtime. For scenarios where you need structured data containers with built-in support for defaults, immutability, or nested types, Python’s dataclasses comes in handy. Let’s explore that next.

Data Classes

Python’s @dataclass decorator offers a concise and readable way to define structured data classes with minimal boilerplate. It automatically generates common methods such as __init__, __repr__, and __eq__ based on the fields you declare. 

Dataclasses require type hints, which help with readability and enable static analysis tools like Mypy. However, Python does not enforce types at runtime. For example, you could accidentally pass a string instead of an integer, and it would still run unless explicitly checked.

Dataclasses also support: 

  • Default values for fields

  • Factory functions using field(default_factory=...) for dynamic defaults like lists

  • Immutability using frozen=True, making objects effectively read-only

Think of dataclasses as a lightweight alternative to full-fledged classes, ideal for clean and declarative data modeling. However, unlike libraries like pydantic, they don’t perform runtime validation out-of-the-box.

Example -

Pydantic for runtime validation

While type hints and dataclasses improve structure and readability, they fall short when it comes to runtime validation and parsing input data — especially from untrusted or external sources like APIs, forms, or databases.

This is where Pydantic shines. It combines the clarity of type annotations with powerful runtime type enforcement, automatic data coercion, and detailed error reporting, sparing you from writing repetitive if not isinstance(...) or try/except blocks and helping catch bad data early in the development cycle.

With Pydantic, your type hints actually do something at runtime — not just during static type checks with tools like MyPy.

Simple example of runtime validation -

Automatic Type Coercion

By default, Pydantic attempts to coerce input values into the specified types whenever possible. If you want to prevent this behavior, Pydantic offers a 'strict mode' that can be enabled at the model level, on individual fields, or even during a specific validation call.

Example -

Field Customization and Constraints

Pydantic lets you fine-tune field behavior using two key tools: 

1.     Field(...) function – to add metadata like title, description, example, etc.

2.     Constrained types – to enforce minimum/maximum lengths, values, regex patterns, and more.

This allows you to build models that don’t just describe what kind of data is expected — but also what shape, range, or pattern that data must conform to.

Field customization improves documentation, strengthens data contracts, and enhances developer experience — especially when used in API frameworks like FastAPI, which automatically uses this metadata in OpenAPI (Swagger) documentation.

Example -

Check out the Pydantic documentation for a wide range of constraints — including numericstringregexdecimaldataclass-specific constraints, and more.

Default Values with default_factory

Sometimes, you need to set a default value for a field that must be generated dynamically at runtime — like a new list, dictionary, UUID, or timestamp. In Python, using mutable types like [] or {} directly as default values is dangerous because the same object is shared across all instances. That’s where default_factory comes in.

The Problem with Mutable Defaults. Let’s look at how direct use of mutable types causes trouble. This happens because the list [{}] is created once when the class is defined and reused for all instances.

 Correcting it with default_factory in dataclasses -

Pydantic's way:

More Advanced: Custom Factory Functions -

You can also pass function references to default_factory — just remember to pass the function itself, not the result of calling it.

Model Composition and Nesting

Pydantic allows you to represent structured, hierarchical data cleanly by defining one BaseModel as a field inside another. This enables reusability of common components across different models and provides automatic, recursive validation and parsing at all levels.

Example -

Custom Validators in Pydantic

In addition to its built-in validation capabilities, Pydantic supports custom validators at both the field and model levels. These allow you to enforce more complex rules, such as verifying specific email domains, enforcing password complexity, validating ID formats or data ranges, and even performing cross-field consistency checks.

Examples -

Pydantic offers additional options like mode='before', custom root validators, and validators that access the validation context. For advanced use cases, refer to the official Pydantic validator documentation.

Pydantic provides rich JSON capabilities — from parsing and serialization to automatic schema generation. This article just scratches the surface. Explore the full range of features in the Pydantic documentation.

Putting It All Together

Choose your tool based on your scenario -

Use typing when:

  • You only need type hints for static analysis

  • Working with existing codebases

  • Want minimal overhead

  • Don't need runtime validation

Use dataclasses when:

  • Want to reduce boilerplate code

  • Need basic data containers

  • Want built-in Python solution

  • Don't need runtime validation

  • Performance is critical

Use pydantic when:

  • Need runtime data validation

  • Working with APIs (JSON serialization/deserialization)

  • Want detailed error messages

  • Building web applications

  • Data comes from external sources

  • Need schema generation

With these foundations in place, you're well-equipped to dive deeper into the official documentation and discover the advanced features that will elevate your Python development.

Tamaghna Basu

CEO | CTO | CISO | AI, Blockchain, Security | Built 3 startups backed by In5, Dubai Future Foundation, SINGTEL, Govt. of India | Keynote Speaker | Mentor@Stanford | Polygon,Solana,Filecoin,Chainlink,NEAR Grant winner

1mo

Rashmi Jayakumar Kare This is awesome! Definitely need to check out the Pydantic section for my current project. 🐍

To view or add a comment, sign in

Others also viewed

Explore topics