SlideShare a Scribd company logo
Architecture of PBS.org
DCPython - June 7, 2011
PBS is…
• PBS is a national federation of independently owned and
operated public television stations and producers
– Each with their own management and development resources
• 1500+ highly trafficked websites:
– http://www.pbs.org/
– http://www.pbs.org/nova/
– http://pbskids.org/
– http://pbskids.org/sesame/
– http://video.pbs.org/
• Enterprise services/APIs
PBS is not!
• We do television dammit!
• Or any of the other ~200 local stations.
What we do
• Technology leadership within public
broadcasting community
• Distribution of national programming content
• Services to local stations
• Core application development. Yeah!!!
A few of our sites
History of PBS.org
Early 1990’s: Hand rolled static html
Late 1990’s: Hand crafted static html + CGI!
Most of 2000’s: Zope/Plone CMS generated static html
2008-10: Django generated static html
Launched Oct 2010: Django all the way
COVE API
• Contains the metadata for all PBS videos online
including pointers to streaming video
• Needed to be:
– Secure
– Fast
– Scalable
COVE API – Technology Stack
• Amazon Elastic Cluster Computing (EC2)
• Amazon Relational Database Service (RDS)
• Linux
• Python
• Django
• Piston for REST API
COVE API - Architecture
Internet
Elastic Load Balancer
Auto Scale Array
App Server 1 App Server N…
HA Proxy
RDS Master RDS Slave 1
RDS Slave 1
RDS Slave 1
App Sync Server
S3
Backups
COVE API – Management Tools
• Amazon Web Service Console
• RightScale
• Splunk
COVE API – Interesting Stuff
• Easy to load test
– Duplicate environment for several days
• Easy to scale
– Autoscale array grows automatically
• Easy to upgrade
– Each server built from vanilla base
COVE API – Lessons learned
• Use normalized data for administration and de-
normalized data for API
COVE API – Lessons learned
• Piston is fine, but lacks flexibility without
significant customization
– TastyPie?
• JSON is probably good enough
• Don’t get fancy with your endpoints
• Stick to REST principles
• Don’t get fancy with your authentication
– Use OAuth2 or simple token
PBS.org and Merlin API
• PBS.org
– Slim, fast layer
– Pulls data from Merlin API
– Uses memcache extensively
– Currently Django, but could be anything (Flask?)
• Merlin API
– Aggregate content from distributed CMSes
– Expose via standardized API
– Power PBS.org and more
Merlin API – Technology stack
• Python
• Django
• MySQL
• Piston
• Solr
• Celery
• RabbitMQ
• Amazon Web Services (“cloud”)
– EC2
– RDS - Relational Database Service
– ELB - Elastic Load Balancing
– Cloudfront CDN
– S3 Storage
Data flow
RSS Feed
Ingestor
Standardized
API
Merlin API architecture
API Endpoint – Django Piston
Search service
Django-haystack
Indexing service
Solr
Data layer – MySQL (RDS)
Administration
Django admin
Feed ingestion
Celery
Merlin API server topology
Elastic Load Balancer
Internet
S3 backups
Celery
Master
DB RDS
Solr
Index
App #N
App #N
App #N
App #n
Autoscaling
array
Merlin API – Management Tools
• Amazon Web Service Console
• RightScale
• Splunk
API - Piston/Haystack/Solr
class WebObjectIndexHandler(BaseHandler):
...
def get_queryset(self):
...
return PistonSearchQuerySet().models(*models)
from haystack.query import SearchQuerySet
class PistonSearchQuerySet(SearchQuerySet):
...
def __getitem__(self, k):
...
return [IndexSerializer(i) for i in
super(PistonSearchQuerySet, self).__getitem__(k)]
Feed ingestor - Celery
from celery.decorators import task, periodic_task
@periodic_task(run_every=timedelta(seconds=300))
def update_webobject_states():
...
solr_visible = WebObject.children.filter(visible=True)
solr_visible = solr_visible.exclude(
flag__api_visible=True, available__isnull=True)
...
updated = solr_visible.update(visible=False,
is_indexed = False)
...
signals.bulk_update.send('tasks.update_webobject_states')
Merlin API - Lessons learned
• Memcached was not necessary
• Denormalized search data via Solr index is much faster
than querying database
• Asynchronous task delegation is awesome
• Celery prone to memory leaks
• App server array for easy horizontal scaling
– Even if not autoscaling, increase min servers
• Never trust data you don’t control (validate!)
Resources
• http://lucene.apache.org/solr/
• http://haystacksearch.org/
• http://celeryproject.org/
• http://celeryproject.org/docs/django-celery/
• http://aws.amazon.com/
PBS Developer Community
• Dedicated to making open.PBS the industry
standard in open development communities.
http://open.pbs.org/
https://github.com/pbs
open@pbs.org
Questions?
Drew Engelson
drew@engelson.net
http://tomatohater.com
Edgar Roman
emroman@pbs.org

More Related Content

PDF
Traffic Engineering in LinkedIn Backbone
PPTX
Alfresco Tech Talk Live - REST API of the Future
PDF
The Garbage Collector deep dive
PDF
AWSによるソーシャルアプリ運用事例
PDF
Gumi mr. horiuchi
PPTX
ARIN API Software and Development Toolkit
PPTX
Git - Introduction and Overview
PDF
Alfresco Day Milano 2016 - Pernexas
Traffic Engineering in LinkedIn Backbone
Alfresco Tech Talk Live - REST API of the Future
The Garbage Collector deep dive
AWSによるソーシャルアプリ運用事例
Gumi mr. horiuchi
ARIN API Software and Development Toolkit
Git - Introduction and Overview
Alfresco Day Milano 2016 - Pernexas

What's hot (9)

PDF
Rails 5 subjective overview
PDF
Rails - getting started
PDF
RPKI Overview, Case Studies, Deployment and Operations
PPTX
Ruby on Rails from an ASP.NET Perspective
PDF
LINX97 - Exascale Member Talk
PDF
Spotify architecture - Pressing play
PDF
What’s New in Rails 5.0?
PDF
Integrating systems in the age of Quarkus and Camel
PDF
Spotify services (SDC 2013)
Rails 5 subjective overview
Rails - getting started
RPKI Overview, Case Studies, Deployment and Operations
Ruby on Rails from an ASP.NET Perspective
LINX97 - Exascale Member Talk
Spotify architecture - Pressing play
What’s New in Rails 5.0?
Integrating systems in the age of Quarkus and Camel
Spotify services (SDC 2013)
Ad

Similar to DCPython: Architecture at PBS (Jun 7, 2011) (20)

PPTX
Architecture at PBS
PPTX
"Spin-up pgbouncer for fun and profit", Vitaliy Kharytonskiy
PDF
Why Django
PDF
REST and some Python (or 'Python "sinners" must REST')
PPT
Fall2010 producer summit_openpbs_final
PDF
What Happens When You Type en.wikipedia.org - SREcon19 EMEA
PDF
flickr's architecture & php
PDF
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
PDF
Media Service on a Cloud :: 콘텐츠연합플랫폼 :: AWS Media Day 2016
PDF
Olist Architecture v2.0
PPTX
Architecting extremelylarge scale web applications
PDF
PHP is the King, nodejs the prince and python the fool
PDF
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PDF
Plone - A History of Python Web
PDF
Lessons from Highly Scalable Architectures at Social Networking Sites
PDF
PBS Tech Con 2011 API Workshop
PPTX
Service stack all the things
PDF
API Design & Security in django
PDF
Open API Architectural Choices Considerations
PDF
Python RESTful webservices with Python: Flask and Django solutions
Architecture at PBS
"Spin-up pgbouncer for fun and profit", Vitaliy Kharytonskiy
Why Django
REST and some Python (or 'Python "sinners" must REST')
Fall2010 producer summit_openpbs_final
What Happens When You Type en.wikipedia.org - SREcon19 EMEA
flickr's architecture & php
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
Media Service on a Cloud :: 콘텐츠연합플랫폼 :: AWS Media Day 2016
Olist Architecture v2.0
Architecting extremelylarge scale web applications
PHP is the King, nodejs the prince and python the fool
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
Plone - A History of Python Web
Lessons from Highly Scalable Architectures at Social Networking Sites
PBS Tech Con 2011 API Workshop
Service stack all the things
API Design & Security in django
Open API Architectural Choices Considerations
Python RESTful webservices with Python: Flask and Django solutions
Ad

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectroscopy.pptx food analysis technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.

DCPython: Architecture at PBS (Jun 7, 2011)

  • 2. PBS is… • PBS is a national federation of independently owned and operated public television stations and producers – Each with their own management and development resources • 1500+ highly trafficked websites: – http://www.pbs.org/ – http://www.pbs.org/nova/ – http://pbskids.org/ – http://pbskids.org/sesame/ – http://video.pbs.org/ • Enterprise services/APIs
  • 3. PBS is not! • We do television dammit! • Or any of the other ~200 local stations.
  • 4. What we do • Technology leadership within public broadcasting community • Distribution of national programming content • Services to local stations • Core application development. Yeah!!!
  • 5. A few of our sites
  • 6. History of PBS.org Early 1990’s: Hand rolled static html Late 1990’s: Hand crafted static html + CGI! Most of 2000’s: Zope/Plone CMS generated static html 2008-10: Django generated static html Launched Oct 2010: Django all the way
  • 7. COVE API • Contains the metadata for all PBS videos online including pointers to streaming video • Needed to be: – Secure – Fast – Scalable
  • 8. COVE API – Technology Stack • Amazon Elastic Cluster Computing (EC2) • Amazon Relational Database Service (RDS) • Linux • Python • Django • Piston for REST API
  • 9. COVE API - Architecture Internet Elastic Load Balancer Auto Scale Array App Server 1 App Server N… HA Proxy RDS Master RDS Slave 1 RDS Slave 1 RDS Slave 1 App Sync Server S3 Backups
  • 10. COVE API – Management Tools • Amazon Web Service Console • RightScale • Splunk
  • 11. COVE API – Interesting Stuff • Easy to load test – Duplicate environment for several days • Easy to scale – Autoscale array grows automatically • Easy to upgrade – Each server built from vanilla base
  • 12. COVE API – Lessons learned • Use normalized data for administration and de- normalized data for API
  • 13. COVE API – Lessons learned • Piston is fine, but lacks flexibility without significant customization – TastyPie? • JSON is probably good enough • Don’t get fancy with your endpoints • Stick to REST principles • Don’t get fancy with your authentication – Use OAuth2 or simple token
  • 14. PBS.org and Merlin API • PBS.org – Slim, fast layer – Pulls data from Merlin API – Uses memcache extensively – Currently Django, but could be anything (Flask?) • Merlin API – Aggregate content from distributed CMSes – Expose via standardized API – Power PBS.org and more
  • 15. Merlin API – Technology stack • Python • Django • MySQL • Piston • Solr • Celery • RabbitMQ • Amazon Web Services (“cloud”) – EC2 – RDS - Relational Database Service – ELB - Elastic Load Balancing – Cloudfront CDN – S3 Storage
  • 17. Merlin API architecture API Endpoint – Django Piston Search service Django-haystack Indexing service Solr Data layer – MySQL (RDS) Administration Django admin Feed ingestion Celery
  • 18. Merlin API server topology Elastic Load Balancer Internet S3 backups Celery Master DB RDS Solr Index App #N App #N App #N App #n Autoscaling array
  • 19. Merlin API – Management Tools • Amazon Web Service Console • RightScale • Splunk
  • 20. API - Piston/Haystack/Solr class WebObjectIndexHandler(BaseHandler): ... def get_queryset(self): ... return PistonSearchQuerySet().models(*models) from haystack.query import SearchQuerySet class PistonSearchQuerySet(SearchQuerySet): ... def __getitem__(self, k): ... return [IndexSerializer(i) for i in super(PistonSearchQuerySet, self).__getitem__(k)]
  • 21. Feed ingestor - Celery from celery.decorators import task, periodic_task @periodic_task(run_every=timedelta(seconds=300)) def update_webobject_states(): ... solr_visible = WebObject.children.filter(visible=True) solr_visible = solr_visible.exclude( flag__api_visible=True, available__isnull=True) ... updated = solr_visible.update(visible=False, is_indexed = False) ... signals.bulk_update.send('tasks.update_webobject_states')
  • 22. Merlin API - Lessons learned • Memcached was not necessary • Denormalized search data via Solr index is much faster than querying database • Asynchronous task delegation is awesome • Celery prone to memory leaks • App server array for easy horizontal scaling – Even if not autoscaling, increase min servers • Never trust data you don’t control (validate!)
  • 23. Resources • http://lucene.apache.org/solr/ • http://haystacksearch.org/ • http://celeryproject.org/ • http://celeryproject.org/docs/django-celery/ • http://aws.amazon.com/
  • 24. PBS Developer Community • Dedicated to making open.PBS the industry standard in open development communities. http://open.pbs.org/ https://github.com/pbs open@pbs.org