SlideShare a Scribd company logo
Declarative Data Modeling 
in Python 
by Joshua Forman
Introducing valid_model 
It includes: 
- base class - Object 
- basic descriptors - Integer, Float, DateTime, String, ... 
- nesting descriptors - Dict, List, Set, EmbeddedObject
Most similar libraries are tightly integrated to a persistence layer: 
SQLAlchemy, Django ORM, mongokit, etc. 
Or are targeted at web forms: 
Formencode, colander, deform 
So the goal was to build a highly flexible unopinionated data modeling 
library.
Some Use Cases 
● Database data model 
● Form validation 
● Test fixtures 
● API request/response objects 
● Scrubbing and normalizing data 
● Data migration
car = { 
'make': None, 
'model': None, 
'doors': None, 
'horsepower': None, 
} 
class Car(object): 
def __init__(self, make=None, model=None, doors=None, 
horsepower=None): 
self.make = make 
self.model = model 
self.doors = doors 
self.horsepower = horsepower 
It is valid python to arbitrarily add new instance attributes in other methods, which can lead to 
headaches (and pylint complaints)
At least I know the fields ahead of time but what datatypes are these attributes? 
def horse_check(value): 
if value == 1: 
raise ValidationError('Is this powered by an actual horse?') 
elif value <= 0: 
raise ValidationError('Phantom horses?') 
return True 
class Car(Object): 
make = String(nullable=False) 
model = String() 
doors = Integer(validator=lambda x: x<=5) 
horsepower = Integer(validator=horse_check)
Nested Schemas is Easy 
class Person(Object): 
name = String(nullable=False) 
homepage = String() 
class BlogPost(Object): 
title = String(nullable=False, mutator=lambda x: x.title()) 
updated = DateTime(nullable=False, default=datetime.utcnow) 
published = DateTime() 
author = EmbeddedObject(Person) 
contributors = List(value=EmbeddedObject(Person), nullable=False) 
tags = List(value=String(nullable=False), nullable=False) 
def validate(self): 
super(BlogPost, self).validate() 
if self.published is not None and self.published > self.updated: 
raise ValidationError('a post cannot be published at a later date 
than it was updated') 
post = BlogPost(title='example post', author={'name': 'Josh'}, tags=['tag1', 'tag2']) 
>>> print post 
{'updated': datetime.datetime(2014, 10, 7, 13, 43, 1, 960174), 
'author': {'homepage': None, 'name': u'Josh'}, 
'contributors': [], 'title': u'Example Post', 'tags': [u'tag1', u'tag2'], 'published': None}
valid_model also provides something closer to strict typing 
class Car(Object): 
make = String(nullable=False) 
model = String() 
doors = Integer(validator=lambda x: x<=5) 
horsepower = Integer(validator=horse_check) 
>>> Car(doors='five') 
valid_model.exc.ValidationError: 'five' is not an int 
>>> Car(doors=10) 
valid_model.exc.ValidationError: doors 
>>> Car(horsepower=1) 
valid_model.exc.ValidationError: Is this powered by an actual horse? 
>>> Car(make=None) 
valid_model.exc.ValidationError: make is not nullable
Normalize your data when it gets set 
class HTTPAccessLog(Object): 
code = Integer(nullable=False) 
status = String(nullable=False, mutator=lambda x: x.upper()) 
timestamp = DateTime(default=datetime.utcnow) 
def validate(self): 
super(HTTPAccessLog, self).validate() 
if not self.status.startswith(unicode(self.code)): 
raise ValidationError('code and status do not match') 
>>> ping = HTTPAccessLog() 
>>> ping.code = 404 
>>> ping.status = '404 not found' 
>>> print ping 
{'status': u'404 NOT FOUND', 'timestamp': datetime.datetime(2014, 10, 7, 13, 36, 15, 217678), 
'code': 404}
Descriptors Tangent 
Python descriptors are fancy attributes. 
class SomeDescriptor(object): 
def __get__(self, instance, klass=None): 
…. 
def __set__(self, instance, value): 
…. 
def __del__(self, instance): 
…. 
class Foo(object): 
b = SomeDescriptor()
@property Descriptors 
@property is the most common 
class Foo(object): 
@property 
def a(self): 
return self._a 
@a.setter 
def a(self, value): 
self._a = value 
# Make an attribute readonly by not defining the setter. 
@property 
def readonly(self): 
return self._private_var 
#Lazily initialize or cache expensive calculations 
@property 
def expensive_func(self): 
if self._result is None: 
self._result = expensive_func() 
return self._result
Customizing Descriptors is Easy 
Extending existing descriptors works like subclassing anything else in python 
class SuperDateTime(DateTime): 
def __set__(self, instance, value): 
if isinstance(value, basestring): 
value = dateutils.parse(value) 
elif isinstance(value, (int, float)): 
value = datetime.utcfromtimestamp(value) 
super(SuperDateTime, self).__set__(instance, value) 
class Decimal(Generic): 
def __set__(self, instance, value): 
if not isinstance(value, decimal.Decimal): 
raise ValidationError('{} is not a decimal'.format(self.name)) 
super(Decimal, self).__set__(instance, value)
Simple wrappers for persistence 
An example of using MongoDB with Redis as a cache 
class PersistBlogPost(object): 
def __init__(self, mongo_collection, redis_conn): 
... 
def insert(self, post): 
self.mongo_collection.insert(post.__json__()) 
def find(self, title): 
post = self.redis_conn.get(title) 
if post: 
return pickle.loads(post) 
else: 
post = self.mongo_collection.find_one({'title': title}) 
if post: 
post = BlogPost(**post) 
self.redis_conn.set(title, pickle.dumps(post)) 
return post
Thank You 
http://github.com/outbrain/valid_model 
Joshua Forman 
jforman@outbrain.com

More Related Content

PPTX
Python: Basic Inheritance
PPTX
Python - OOP Programming
PDF
Object.__class__.__dict__ - python object model and friends - with examples
PPTX
FFW Gabrovo PMG - PHP OOP Part 3
PPT
Synapseindia object oriented programming in php
PPTX
Introduce oop in python
PPTX
JavaScript: The Good Parts Or: How A C# Developer Learned To Stop Worrying An...
PPTX
Python Programming Essentials - M20 - Classes and Objects
Python: Basic Inheritance
Python - OOP Programming
Object.__class__.__dict__ - python object model and friends - with examples
FFW Gabrovo PMG - PHP OOP Part 3
Synapseindia object oriented programming in php
Introduce oop in python
JavaScript: The Good Parts Or: How A C# Developer Learned To Stop Worrying An...
Python Programming Essentials - M20 - Classes and Objects

What's hot (20)

PPTX
Functions and Objects in JavaScript
PDF
JavaScript - Chapter 6 - Basic Functions
PDF
Python programming : Classes objects
PDF
JavaScript - Chapter 8 - Objects
PDF
javascript objects
PDF
3.1 javascript objects_DOM
KEY
JavaScript Growing Up
PPTX
Python oop class 1
PDF
Csharp_Chap03
PPTX
Ios development
PDF
PYTHON-Chapter 3-Classes and Object-oriented Programming: MAULIK BORSANIYA
PDF
JavaScript Objects
PPTX
ActionScript3 collection query API proposal
PPTX
Basics of Object Oriented Programming in Python
PDF
Few simple-type-tricks in scala
PDF
Powerful JavaScript Tips and Best Practices
PPTX
Oop in-php
KEY
Parte II Objective C
PDF
OOPs & Inheritance Notes
PPTX
Python advance
Functions and Objects in JavaScript
JavaScript - Chapter 6 - Basic Functions
Python programming : Classes objects
JavaScript - Chapter 8 - Objects
javascript objects
3.1 javascript objects_DOM
JavaScript Growing Up
Python oop class 1
Csharp_Chap03
Ios development
PYTHON-Chapter 3-Classes and Object-oriented Programming: MAULIK BORSANIYA
JavaScript Objects
ActionScript3 collection query API proposal
Basics of Object Oriented Programming in Python
Few simple-type-tricks in scala
Powerful JavaScript Tips and Best Practices
Oop in-php
Parte II Objective C
OOPs & Inheritance Notes
Python advance
Ad

Similar to Declarative Data Modeling in Python (20)

PPT
Oop java
PPTX
constructors.pptx
PDF
Ruby Development and MongoMapper (John Nunemaker)
PDF
Python magicmethods
PPTX
Python_Unit_2 OOPS.pptx
PPTX
UNIT-5 object oriented programming lecture
PPT
Spsl v unit - final
PPT
Spsl vi unit final
PPTX
Oop2010 Scala Presentation Stal
PPTX
Real World MVC
KEY
PDF
Object Trampoline: Why having not the object you want is what you need.
PPTX
Java scriptforjavadev part2a
PDF
Mastering OOP: Understanding the Four Core Pillars
PPTX
11. session 11 functions and objects
PDF
PofEAA and SQLAlchemy
PDF
Scalable web application architecture
KEY
Pyimproved again
PDF
Pyconie 2012
KEY
CoffeeScript - A Rubyist's Love Affair
Oop java
constructors.pptx
Ruby Development and MongoMapper (John Nunemaker)
Python magicmethods
Python_Unit_2 OOPS.pptx
UNIT-5 object oriented programming lecture
Spsl v unit - final
Spsl vi unit final
Oop2010 Scala Presentation Stal
Real World MVC
Object Trampoline: Why having not the object you want is what you need.
Java scriptforjavadev part2a
Mastering OOP: Understanding the Four Core Pillars
11. session 11 functions and objects
PofEAA and SQLAlchemy
Scalable web application architecture
Pyimproved again
Pyconie 2012
CoffeeScript - A Rubyist's Love Affair
Ad

Recently uploaded (20)

PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Cost to Outsource Software Development in 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Digital Strategies for Manufacturing Companies
PDF
Nekopoi APK 2025 free lastest update
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
System and Network Administration Chapter 2
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Computer Software and OS of computer science of grade 11.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Digital Systems & Binary Numbers (comprehensive )
PTS Company Brochure 2025 (1).pdf.......
Understanding Forklifts - TECH EHS Solution
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Designing Intelligence for the Shop Floor.pdf
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Wondershare Filmora 15 Crack With Activation Key [2025
Cost to Outsource Software Development in 2025
top salesforce developer skills in 2025.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Digital Strategies for Manufacturing Companies
Nekopoi APK 2025 free lastest update
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
System and Network Administration Chapter 2
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Softaken Excel to vCard Converter Software.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Computer Software and OS of computer science of grade 11.pptx

Declarative Data Modeling in Python

  • 1. Declarative Data Modeling in Python by Joshua Forman
  • 2. Introducing valid_model It includes: - base class - Object - basic descriptors - Integer, Float, DateTime, String, ... - nesting descriptors - Dict, List, Set, EmbeddedObject
  • 3. Most similar libraries are tightly integrated to a persistence layer: SQLAlchemy, Django ORM, mongokit, etc. Or are targeted at web forms: Formencode, colander, deform So the goal was to build a highly flexible unopinionated data modeling library.
  • 4. Some Use Cases ● Database data model ● Form validation ● Test fixtures ● API request/response objects ● Scrubbing and normalizing data ● Data migration
  • 5. car = { 'make': None, 'model': None, 'doors': None, 'horsepower': None, } class Car(object): def __init__(self, make=None, model=None, doors=None, horsepower=None): self.make = make self.model = model self.doors = doors self.horsepower = horsepower It is valid python to arbitrarily add new instance attributes in other methods, which can lead to headaches (and pylint complaints)
  • 6. At least I know the fields ahead of time but what datatypes are these attributes? def horse_check(value): if value == 1: raise ValidationError('Is this powered by an actual horse?') elif value <= 0: raise ValidationError('Phantom horses?') return True class Car(Object): make = String(nullable=False) model = String() doors = Integer(validator=lambda x: x<=5) horsepower = Integer(validator=horse_check)
  • 7. Nested Schemas is Easy class Person(Object): name = String(nullable=False) homepage = String() class BlogPost(Object): title = String(nullable=False, mutator=lambda x: x.title()) updated = DateTime(nullable=False, default=datetime.utcnow) published = DateTime() author = EmbeddedObject(Person) contributors = List(value=EmbeddedObject(Person), nullable=False) tags = List(value=String(nullable=False), nullable=False) def validate(self): super(BlogPost, self).validate() if self.published is not None and self.published > self.updated: raise ValidationError('a post cannot be published at a later date than it was updated') post = BlogPost(title='example post', author={'name': 'Josh'}, tags=['tag1', 'tag2']) >>> print post {'updated': datetime.datetime(2014, 10, 7, 13, 43, 1, 960174), 'author': {'homepage': None, 'name': u'Josh'}, 'contributors': [], 'title': u'Example Post', 'tags': [u'tag1', u'tag2'], 'published': None}
  • 8. valid_model also provides something closer to strict typing class Car(Object): make = String(nullable=False) model = String() doors = Integer(validator=lambda x: x<=5) horsepower = Integer(validator=horse_check) >>> Car(doors='five') valid_model.exc.ValidationError: 'five' is not an int >>> Car(doors=10) valid_model.exc.ValidationError: doors >>> Car(horsepower=1) valid_model.exc.ValidationError: Is this powered by an actual horse? >>> Car(make=None) valid_model.exc.ValidationError: make is not nullable
  • 9. Normalize your data when it gets set class HTTPAccessLog(Object): code = Integer(nullable=False) status = String(nullable=False, mutator=lambda x: x.upper()) timestamp = DateTime(default=datetime.utcnow) def validate(self): super(HTTPAccessLog, self).validate() if not self.status.startswith(unicode(self.code)): raise ValidationError('code and status do not match') >>> ping = HTTPAccessLog() >>> ping.code = 404 >>> ping.status = '404 not found' >>> print ping {'status': u'404 NOT FOUND', 'timestamp': datetime.datetime(2014, 10, 7, 13, 36, 15, 217678), 'code': 404}
  • 10. Descriptors Tangent Python descriptors are fancy attributes. class SomeDescriptor(object): def __get__(self, instance, klass=None): …. def __set__(self, instance, value): …. def __del__(self, instance): …. class Foo(object): b = SomeDescriptor()
  • 11. @property Descriptors @property is the most common class Foo(object): @property def a(self): return self._a @a.setter def a(self, value): self._a = value # Make an attribute readonly by not defining the setter. @property def readonly(self): return self._private_var #Lazily initialize or cache expensive calculations @property def expensive_func(self): if self._result is None: self._result = expensive_func() return self._result
  • 12. Customizing Descriptors is Easy Extending existing descriptors works like subclassing anything else in python class SuperDateTime(DateTime): def __set__(self, instance, value): if isinstance(value, basestring): value = dateutils.parse(value) elif isinstance(value, (int, float)): value = datetime.utcfromtimestamp(value) super(SuperDateTime, self).__set__(instance, value) class Decimal(Generic): def __set__(self, instance, value): if not isinstance(value, decimal.Decimal): raise ValidationError('{} is not a decimal'.format(self.name)) super(Decimal, self).__set__(instance, value)
  • 13. Simple wrappers for persistence An example of using MongoDB with Redis as a cache class PersistBlogPost(object): def __init__(self, mongo_collection, redis_conn): ... def insert(self, post): self.mongo_collection.insert(post.__json__()) def find(self, title): post = self.redis_conn.get(title) if post: return pickle.loads(post) else: post = self.mongo_collection.find_one({'title': title}) if post: post = BlogPost(**post) self.redis_conn.set(title, pickle.dumps(post)) return post