SlideShare a Scribd company logo
Architecture of PBS.org
 DCPython - June 7, 2011
PBS is…

• PBS is a national federation of independently owned and
  operated public television stations and producers
   – Each with their own management and development resources


• 1500+ highly trafficked websites:
   –   http://www.pbs.org/
   –   http://www.pbs.org/nova/
   –   http://pbskids.org/
   –   http://pbskids.org/sesame/
   –   http://video.pbs.org/

• Enterprise services/APIs
PBS is not!



• Radio is easy… We do television!




• Or any of the other ~200 local stations.
What we do

• Technology leadership within public
  broadcasting community
• Distribution of national programming content
• Services to local stations
• Core application development. Yeah!!!
A few of our sites
History of PBS.org




      Early 1990’s: Hand rolled static html
       Late 1990’s: Hand crafted static html + CGI!
    Most of 2000’s: Zope/Plone CMS generated static html
          2008-10: Django generated static html
Launched Oct 2010: Django all the way
COVE API

• Contains the metadata for all PBS videos online
  including pointers to streaming video
• Needed to be:
   – Secure
   – Fast
   – Scalable
COVE API – Technology Stack

•   Amazon Elastic Cluster Computing (EC2)
•   Amazon Relational Database Service (RDS)
•   Linux
•   Python
•   Django
•   Piston for REST API
COVE API - Architecture
                         Internet

                    Elastic Load Balancer

 Auto Scale Array
         App Server 1      …        App Server N

                         HA Proxy

                                                      S3
           RDS Master               RDS Slave 11
                                     RDS Slave
                                      RDS Slave 1   Backups

App Sync Server
COVE API – Management Tools

• Amazon Web Service Console
• RightScale
• Splunk
COVE API – Interesting Stuff

• Easy to load test
  – Duplicate environment for several days
• Easy to scale
  – Autoscale array grows automatically
• Easy to upgrade
  – Each server built from vanilla base
COVE API – Lessons learned
• Use normalized data for administration and de-
  normalized data for API
COVE API – Lessons learned
• Piston is fine, but lacks flexibility without
  significant customization
   – TastyPie?
• JSON is probably good enough
• Don’t get fancy with your endpoints
• Stick to REST principles
• Don’t get fancy with your authentication
   – Use OAuth2 or simple token
PBS.org and Merlin API

• PBS.org
   – Slim, fast layer
   – Pulls data from Merlin API
   – Uses memcache extensively
   – Currently Django, but could be anything (Flask?)

• Merlin API
  – Aggregate content from distributed CMSes
  – Expose via standardized API
  – Power PBS.org and more
Merlin API – Technology stack

•   Python       • Amazon Web Services (“cloud”)
•   Django         – EC2
•   MySQL          – RDS - Relational Database Service
                      – ELB - Elastic Load Balancing
•   Piston
                          – Cloudfront CDN
•   Solr                      – S3Storage
•   Celery
•   RabbitMQ
Data flow




RSS Feed   Standardized
Ingestor       API
Merlin API architecture

   API Endpoint – Django Piston

          Search service             Indexing service
         Django-haystack                   Solr

     Data layer – MySQL (RDS)

Administration      Feed ingestion
Django admin           Celery
Merlin API server topology

          Internet



     Elastic Load Balancer


Autoscaling     App #N
                 App #N                Solr   Master
      array       App #N     Celery
                    App #n            Index   DB RDS


               S3 backups
Merlin API – Management Tools

• Amazon Web Service Console
• RightScale
• Splunk
API - Piston/Haystack/Solr
class WebObjectIndexHandler(BaseHandler):
    ...
    def get_queryset(self):
        ...
        return PistonSearchQuerySet().models(*models)

from haystack.query import SearchQuerySet
class PistonSearchQuerySet(SearchQuerySet):
    ...
    def __getitem__(self, k):
        ...
        return [IndexSerializer(i) for i in
super(PistonSearchQuerySet, self).__getitem__(k)]
Feed ingestor - Celery
from celery.decorators import task, periodic_task

@periodic_task(run_every=timedelta(seconds=300))
def update_webobject_states():
    ...
solr_visible = WebObject.children.filter(visible=True)
solr_visible = solr_visible.exclude(
flag__api_visible=True, available__isnull=True)
    ...
    updated = solr_visible.update(visible=False,
is_indexed = False)
    ...
signals.bulk_update.send('tasks.update_webobject_states')
Merlin API - Lessons learned

• Memcached was not necessary
• Denormalized search data via Solr index is much faster
  than querying database
• Asynchronous task delegation is awesome
• Celery prone to memory leaks
• App server array for easy horizontal scaling
   – Even if not autoscaling, increase min servers
• Never trust data you don’t control (validate!)
Resources

•   http://lucene.apache.org/solr/
•   http://haystacksearch.org/
•   http://celeryproject.org/
•   http://celeryproject.org/docs/django-celery/
•   http://aws.amazon.com/
PBS Developer Community

• Dedicated to making open.PBS the industry
  standard in open development communities.

                  http://open.pbs.org/
                  https://github.com/pbs

                  open@pbs.org
Questions?

  Drew Engelson
  drew@engelson.net
  http://tomatohater.com



  Edgar Roman
  emroman@pbs.org

More Related Content

PDF
AWSによるソーシャルアプリ運用事例
PDF
Alfresco Day Roma 2015: Platform Update
PDF
AWS: Scaling With Elastic Beanstalk
PPTX
Deep Dive into AWS ECS and Spot Instances at Scale
PPTX
Alfresco overview EDM
PPT
Ec2 for Startups - Ian Eure
PDF
Amazon Elastic Beanstalk
PDF
AppScale Talk at SBonRails
AWSによるソーシャルアプリ運用事例
Alfresco Day Roma 2015: Platform Update
AWS: Scaling With Elastic Beanstalk
Deep Dive into AWS ECS and Spot Instances at Scale
Alfresco overview EDM
Ec2 for Startups - Ian Eure
Amazon Elastic Beanstalk
AppScale Talk at SBonRails

What's hot (7)

PDF
AppScale @ LA.rb
PPT
Alfresco WCM For High Scalability
PDF
Active Cloud DB at CloudComp '10
PPTX
從劍宗到氣宗 - 談AWS ECS與Serverless最佳實踐
PDF
AppScale + Neptune @ HPCDB
PDF
Appscale at CLOUDCOMP '09
PDF
Auto scaling with Ruby, AWS, Jenkins and Redis
AppScale @ LA.rb
Alfresco WCM For High Scalability
Active Cloud DB at CloudComp '10
從劍宗到氣宗 - 談AWS ECS與Serverless最佳實踐
AppScale + Neptune @ HPCDB
Appscale at CLOUDCOMP '09
Auto scaling with Ruby, AWS, Jenkins and Redis
Ad

Similar to Architecture at PBS (20)

PPTX
DCPython: Architecture at PBS (Jun 7, 2011)
PPTX
Open stack in sina
PDF
Django at Scale
PDF
Running a business in the Cloud with AWS
PPTX
Service stack all the things
PPTX
Digging deeper into service stack
PPTX
Scaling with swagger
ODP
Introducing OpenStack for Beginners
PDF
Deploy Python apps in 5 min with a PaaS
PDF
Service-Oriented Design and Implement with Rails3
PDF
NoSQL, Apache SOLR and Apache Hadoop
PDF
Why Django
PDF
Snakes on the Web
KEY
Django deployment with PaaS
PDF
Lessons from Highly Scalable Architectures at Social Networking Sites
PPTX
Architecting extremelylarge scale web applications
PDF
OpenStack API's and WSGI
PPTX
Integrating OpenStack To Existing Infrastructure
PDF
SFScon 2020 - Nikola Milisavljevic - BASE - Python REST API framework
PDF
Python Load Testing - Pygotham 2012
DCPython: Architecture at PBS (Jun 7, 2011)
Open stack in sina
Django at Scale
Running a business in the Cloud with AWS
Service stack all the things
Digging deeper into service stack
Scaling with swagger
Introducing OpenStack for Beginners
Deploy Python apps in 5 min with a PaaS
Service-Oriented Design and Implement with Rails3
NoSQL, Apache SOLR and Apache Hadoop
Why Django
Snakes on the Web
Django deployment with PaaS
Lessons from Highly Scalable Architectures at Social Networking Sites
Architecting extremelylarge scale web applications
OpenStack API's and WSGI
Integrating OpenStack To Existing Infrastructure
SFScon 2020 - Nikola Milisavljevic - BASE - Python REST API framework
Python Load Testing - Pygotham 2012
Ad

More from Public Broadcasting Service (10)

PPTX
Cloud Orchestration is Broken
PPT
Simplified Localization+ Presentation
PPTX
PBS Localization+ API Webinar
PPT
Mobile Presentation at PBS TECH CON 2011
PPTX
PBS Presentation at AWS Summit 2012
PPT
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
PPTX
SQL Injection Defense in Python
PDF
PBS Tech Con 2011 API Workshop
PPT
Fall2010 producer summit_openpbs_final
Cloud Orchestration is Broken
Simplified Localization+ Presentation
PBS Localization+ API Webinar
Mobile Presentation at PBS TECH CON 2011
PBS Presentation at AWS Summit 2012
I've Got a Key to Your API, Now What? (Joint PBS and NPR API Presentation Giv...
SQL Injection Defense in Python
PBS Tech Con 2011 API Workshop
Fall2010 producer summit_openpbs_final

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PPTX
sap open course for s4hana steps from ECC to s4
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectroscopy.pptx food analysis technology
sap open course for s4hana steps from ECC to s4
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation

Architecture at PBS

  • 1. Architecture of PBS.org DCPython - June 7, 2011
  • 2. PBS is… • PBS is a national federation of independently owned and operated public television stations and producers – Each with their own management and development resources • 1500+ highly trafficked websites: – http://www.pbs.org/ – http://www.pbs.org/nova/ – http://pbskids.org/ – http://pbskids.org/sesame/ – http://video.pbs.org/ • Enterprise services/APIs
  • 3. PBS is not! • Radio is easy… We do television! • Or any of the other ~200 local stations.
  • 4. What we do • Technology leadership within public broadcasting community • Distribution of national programming content • Services to local stations • Core application development. Yeah!!!
  • 5. A few of our sites
  • 6. History of PBS.org Early 1990’s: Hand rolled static html Late 1990’s: Hand crafted static html + CGI! Most of 2000’s: Zope/Plone CMS generated static html 2008-10: Django generated static html Launched Oct 2010: Django all the way
  • 7. COVE API • Contains the metadata for all PBS videos online including pointers to streaming video • Needed to be: – Secure – Fast – Scalable
  • 8. COVE API – Technology Stack • Amazon Elastic Cluster Computing (EC2) • Amazon Relational Database Service (RDS) • Linux • Python • Django • Piston for REST API
  • 9. COVE API - Architecture Internet Elastic Load Balancer Auto Scale Array App Server 1 … App Server N HA Proxy S3 RDS Master RDS Slave 11 RDS Slave RDS Slave 1 Backups App Sync Server
  • 10. COVE API – Management Tools • Amazon Web Service Console • RightScale • Splunk
  • 11. COVE API – Interesting Stuff • Easy to load test – Duplicate environment for several days • Easy to scale – Autoscale array grows automatically • Easy to upgrade – Each server built from vanilla base
  • 12. COVE API – Lessons learned • Use normalized data for administration and de- normalized data for API
  • 13. COVE API – Lessons learned • Piston is fine, but lacks flexibility without significant customization – TastyPie? • JSON is probably good enough • Don’t get fancy with your endpoints • Stick to REST principles • Don’t get fancy with your authentication – Use OAuth2 or simple token
  • 14. PBS.org and Merlin API • PBS.org – Slim, fast layer – Pulls data from Merlin API – Uses memcache extensively – Currently Django, but could be anything (Flask?) • Merlin API – Aggregate content from distributed CMSes – Expose via standardized API – Power PBS.org and more
  • 15. Merlin API – Technology stack • Python • Amazon Web Services (“cloud”) • Django – EC2 • MySQL – RDS - Relational Database Service – ELB - Elastic Load Balancing • Piston – Cloudfront CDN • Solr – S3Storage • Celery • RabbitMQ
  • 16. Data flow RSS Feed Standardized Ingestor API
  • 17. Merlin API architecture API Endpoint – Django Piston Search service Indexing service Django-haystack Solr Data layer – MySQL (RDS) Administration Feed ingestion Django admin Celery
  • 18. Merlin API server topology Internet Elastic Load Balancer Autoscaling App #N App #N Solr Master array App #N Celery App #n Index DB RDS S3 backups
  • 19. Merlin API – Management Tools • Amazon Web Service Console • RightScale • Splunk
  • 20. API - Piston/Haystack/Solr class WebObjectIndexHandler(BaseHandler): ... def get_queryset(self): ... return PistonSearchQuerySet().models(*models) from haystack.query import SearchQuerySet class PistonSearchQuerySet(SearchQuerySet): ... def __getitem__(self, k): ... return [IndexSerializer(i) for i in super(PistonSearchQuerySet, self).__getitem__(k)]
  • 21. Feed ingestor - Celery from celery.decorators import task, periodic_task @periodic_task(run_every=timedelta(seconds=300)) def update_webobject_states(): ... solr_visible = WebObject.children.filter(visible=True) solr_visible = solr_visible.exclude( flag__api_visible=True, available__isnull=True) ... updated = solr_visible.update(visible=False, is_indexed = False) ... signals.bulk_update.send('tasks.update_webobject_states')
  • 22. Merlin API - Lessons learned • Memcached was not necessary • Denormalized search data via Solr index is much faster than querying database • Asynchronous task delegation is awesome • Celery prone to memory leaks • App server array for easy horizontal scaling – Even if not autoscaling, increase min servers • Never trust data you don’t control (validate!)
  • 23. Resources • http://lucene.apache.org/solr/ • http://haystacksearch.org/ • http://celeryproject.org/ • http://celeryproject.org/docs/django-celery/ • http://aws.amazon.com/
  • 24. PBS Developer Community • Dedicated to making open.PBS the industry standard in open development communities. http://open.pbs.org/ https://github.com/pbs open@pbs.org
  • 25. Questions? Drew Engelson drew@engelson.net http://tomatohater.com Edgar Roman emroman@pbs.org