SlideShare a Scribd company logo
Inside Wordnik's Architecture

          Tony Tam
          @fehguy
Who is Wordnik?

• Founded in 2008 by Erin McKean
• "Understand meaning of words
  automatically"
• Patented "Free-Range Definition"
  technology
• Constructed largest (known) English Word
  Graph
               We do Discovery
It's all about Data!
Data?

• Word Graph is                       80 S
 built by data
                                     reads!
• Runtime answers
 needed fast



     50M+
     Nodes!

                            80M+
                            Edges!
What we do with Data

• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
  current, real-world data
What we do with Data

• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
  current, real-world data
                Language
                 is NOT
                  static
What we do with Data

• Update the Graph constantly
• Augment our NLP pipeline
• "Reality-based Annotation" with
 Next???
  current, real-world data          Twitter?
                Language
                 is NOT
                  static
 Tumblr?                            Wordpres
                                       s
Is a 20 year-old corpus good enough?
How we do it

• Amazon EC2-based deployment
• Efficiency through constraint-based
  architecture
  •   Small is Big!
• Horizontal scaling by adding servers!
  •   Yea, we can always go vertical
• Blah, blah, more details!
Micro Services

• Services are stand-alone building blocks
• Increase capacity through a "more like this"
  button
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Micro Services

• Big application => micro services


Monolithic
application



    "Isn't this
       just
     SOA?"
Not PO-SOA
• This is different
  •   No proprietary message bus
  •   Decoupled objects
  •   Dedicated storage***
• Speak REST
  •   Develop your services in…
      •   Java
      •   Scala
      •   Ruby
      •   Php
Speak REST?

• Sounds good but…
 •   REST semantics vary wildly
 •   HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
/api/pet.json/1 (DELETE)            Al
/api/pet.json/1 (POST empty)       valid!


So…
Speak REST?

• Sounds good but…
 •   REST semantics vary wildly
 •   HATEOAS vs. practical REST?
/api/pet.json/1?delete (GET)
                         Peer            All
/api/pet.json/1     (DELETE)
                       Review!          valid!
/api/pet.json/1 (POST empty)
     Better
      Docs!
So…
                                      API
        API                        Styleguide
      Council!                          !
SOA makes new Challenges
• It's communication (not easy)
• Need a consumer & provider contract
• Driving force to create Swagger
What is Swagger?

• Swagger is…
  •   Spec for declaring and documenting an API
  •   A framework for auto-generating the spec
  •   A library for client library generation
  •   A JSON-based test framework
• It's open source!
  •   http://swagger.wordnik.com
How?

• Swagger Codegen
  •   Creates a client based on your Swagger Spec
scala src/main/scala/Codegen.scala 
  ${swagger-spec-url}




                                            Scal
                                             a


                                   Ruby
In the Wordnik Workflow
• Jenkins will…
 •   Build a service library
 •   Build a stand-alone application distro
 •   Build an installable image (RPM)
 •   Build a compatible client library
• Consumers will…
 •   Declare dependency on a service version
 •   Use a client for that version
 •   Be given a list of compatible services, by
     cluster, version
Back to Data

• Micro services have small(ish) databases
 •   Share nothing across services
 •   YES To replica sets
• Deployed to ephemeral storage
 •   (more in a bit)
 •   Small by design
• How to keep them small?
Keeping Databases Small

• Some easy tricks
 •   Schema-less => "schema per document"
 •   Keep field names short!
db.foo.save({user_name:"Tony"})
                         Repeat
db.foo.save({un:"Tony"})10e9 times!
• Indexes
 •   They can get *huge*
 •   Make _id matter!
Keeping Databases Small

• Some easy tricks
 •   Schema-less => "schema per document"
 •   Keep field names short!
db.foo.save({user_name:"Tony"})
                         Repeat
db.foo.save({un:"Tony"})10e9 times!
• Indexes
 •   They can get *huge*
 •   Make _id matter!
Keeping Databases Small
• Don't make _id just an "auto increment"
 You're stuck with it! Be smart
 •   User collection? Try _id: username
 •   Email collection? Try _id: email
 •   Date-driven collection? How about _id: "20120502"
     •   db.logins.find({_id:/^201205/})      1
                                              7




         Be lazy until
          you can't
          anymore!                                1      2
                                                  5      7
Keeping Databases Small

• DAO or die!
 •   Fancy index scheme => control access to
     collections
                           NO!!!!




                                               Yes
Keeping Databases Small

• If/when you need to shard…



                                  Don't
                                  make
                                  your
                               clients do
                                  this!
Keeping Databases Small

• Again, why keep them small?
• Starting a new replica
 •   Initial sync
 •   Index rebuilding
• Backups
• Index Compaction
• Speed
• TCO
Keeping Databases Small

• Again, why keep them small?
                            Everythin
• Starting a new replica      g is
 •   Initial sync            easier
 •   Index rebuilding
• Backups
• Index Compaction
• Speed This can
• TCO         take
                    DAYS
Ephemeral Storage?

• Every EC2 instance type has some
  (except micro)
• Only available via EC2 API
• Less prone to issues than EBS
• Faster ***
• Included in cost of server
Ephemeral Storage?

• Every EC2 instance type has some
  (except micro)
• Only available via EC2 API
• Less prone to issues than EBS
• Faster ***
• Included in cost of server
                   But dies
                   on host
                   reboot!
Keeping Data Safe
Which Zone? Which Region?
Which Zone? Which Region?




Arbiter handles
    external
  connectivity
issue detection
How does this really stack up?

• Tuned indexes & access, split with services
  •   Was: 3 DAS Devices w/18 TB disk
  •   Now: 21 M1.large + M1.xlarge instances
      •   3 Zones, 2 regions
• The Gory Details
blog.wordnik.com/with-software-small-is-the-new-big
As for Services

• ~1,000 requests/sec via Swagger-enabled
  micro services
• Direct to Consumer via SwaggerSocket
What's Next

• Migrating all services to SwaggerSocket
 •   OSS WebSocket subprotocol
https://github.com/wordnik/swaggersocket
 •   25%-100% speed increase (sync & async)
• Discovery via Wordnik
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
If you're Interested…
See more:
developer.wordnik.com
swagger.wordnik.com
github.com/wordnik

            Questions?

More Related Content

PPTX
A Tasty deep-dive into Open API Specification Links
PPTX
Introducing Swagger
PDF
Developing Faster with Swagger
PDF
Streamlining API with Swagger.io
PPTX
Rest API with Swagger and NodeJS
PDF
Donald Ferguson - Old Programmers Can Learn New Tricks
PPTX
Scala & Swagger at Wordnik
PDF
Mocking APIs Collaboratively with Postman
A Tasty deep-dive into Open API Specification Links
Introducing Swagger
Developing Faster with Swagger
Streamlining API with Swagger.io
Rest API with Swagger and NodeJS
Donald Ferguson - Old Programmers Can Learn New Tricks
Scala & Swagger at Wordnik
Mocking APIs Collaboratively with Postman

What's hot (20)

PPTX
Alfresco Process Services REST API - Alfresco DevCon 2018
PPTX
Api Design
PPTX
Do's and Don'ts of APIs
PDF
CI/CD and Asset Serving for Single Page Apps
PPTX
Let's Jira do the work
PPTX
Rest in practice
PDF
Building the Eventbrite API Ecosystem
PPTX
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
PPTX
Building APIs with Node.js and Swagger
PPTX
Design for scale
PDF
Five Ways to Scale your API Without Touching Your Code
PPTX
Swagger in the API Lifecycle
PDF
Design Driven API Development
PPTX
ADF Basics and Beyond - Alfresco Devcon 2018
PDF
Coders Workshop: API First Mobile Development Featuring Angular and Node
PPTX
Rest api to integrate with your site
PPTX
Api Design Anti-Patterns
PDF
Premature optimisation: The Root of All Evil
PDF
User-percieved performance
PPTX
Serverless Apps
Alfresco Process Services REST API - Alfresco DevCon 2018
Api Design
Do's and Don'ts of APIs
CI/CD and Asset Serving for Single Page Apps
Let's Jira do the work
Rest in practice
Building the Eventbrite API Ecosystem
Familiarity Breeds Contempt (Or why all APIs suck, even yours.)
Building APIs with Node.js and Swagger
Design for scale
Five Ways to Scale your API Without Touching Your Code
Swagger in the API Lifecycle
Design Driven API Development
ADF Basics and Beyond - Alfresco Devcon 2018
Coders Workshop: API First Mobile Development Featuring Angular and Node
Rest api to integrate with your site
Api Design Anti-Patterns
Premature optimisation: The Root of All Evil
User-percieved performance
Serverless Apps
Ad

Similar to Inside Wordnik's Architecture (20)

PPTX
Running MongoDB in the Cloud
PPTX
Scaling with swagger
PDF
What Drove Wordnik Non-Relational?
PPTX
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
PDF
Web Development using Ruby on Rails
KEY
Why ruby and rails
KEY
From 100s to 100s of Millions
PDF
Solr @ eBay Kleinanzeigen
PPTX
Data Modeling for NoSQL
KEY
Austin NoSQL 2011-07-06
PDF
From a student to an apache committer practice of apache io tdb
KEY
Message:Passing - lpw 2012
PDF
Ohio Devfest - Visual Analysis with GCP
KEY
Social dev camp_2011
PPTX
Webcast: DevOps in AWS is different! How can containers help?
PPTX
My Little Webap - DevOpsSec is Magic
PPTX
SeaJUG May 2012 mybatis
PDF
Five Years of EC2 Distilled
PDF
Designing your API Server for mobile apps
PDF
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Running MongoDB in the Cloud
Scaling with swagger
What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
Web Development using Ruby on Rails
Why ruby and rails
From 100s to 100s of Millions
Solr @ eBay Kleinanzeigen
Data Modeling for NoSQL
Austin NoSQL 2011-07-06
From a student to an apache committer practice of apache io tdb
Message:Passing - lpw 2012
Ohio Devfest - Visual Analysis with GCP
Social dev camp_2011
Webcast: DevOps in AWS is different! How can containers help?
My Little Webap - DevOpsSec is Magic
SeaJUG May 2012 mybatis
Five Years of EC2 Distilled
Designing your API Server for mobile apps
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Ad

More from Tony Tam (14)

PPTX
API Design first with Swagger
PPTX
Writer APIs in Java faster with Swagger Inflector
PDF
Fastest to Mobile with Scalatra + Swagger
PPTX
Swagger APIs for Humans and Robots (Gluecon)
PPTX
Love your API with Swagger (Gluecon lightning talk)
PDF
Swagger for-your-api
PPTX
Swagger for startups
PPTX
System insight without Interference
PPTX
Keeping MongoDB Data Safe
PPTX
Why Wordnik went non-relational
PPTX
Building a Directed Graph with MongoDB
PPTX
Managing a MongoDB Deployment
PPTX
Keeping the Lights On with MongoDB
PPTX
Migrating from MySQL to MongoDB at Wordnik
API Design first with Swagger
Writer APIs in Java faster with Swagger Inflector
Fastest to Mobile with Scalatra + Swagger
Swagger APIs for Humans and Robots (Gluecon)
Love your API with Swagger (Gluecon lightning talk)
Swagger for-your-api
Swagger for startups
System insight without Interference
Keeping MongoDB Data Safe
Why Wordnik went non-relational
Building a Directed Graph with MongoDB
Managing a MongoDB Deployment
Keeping the Lights On with MongoDB
Migrating from MySQL to MongoDB at Wordnik

Recently uploaded (20)

PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Advanced Soft Computing BINUS July 2025.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Inside Wordnik's Architecture

  • 1. Inside Wordnik's Architecture Tony Tam @fehguy
  • 2. Who is Wordnik? • Founded in 2008 by Erin McKean • "Understand meaning of words automatically" • Patented "Free-Range Definition" technology • Constructed largest (known) English Word Graph We do Discovery
  • 4. Data? • Word Graph is 80 S built by data reads! • Runtime answers needed fast 50M+ Nodes! 80M+ Edges!
  • 5. What we do with Data • Update the Graph constantly • Augment our NLP pipeline • "Reality-based Annotation" with current, real-world data
  • 6. What we do with Data • Update the Graph constantly • Augment our NLP pipeline • "Reality-based Annotation" with current, real-world data Language is NOT static
  • 7. What we do with Data • Update the Graph constantly • Augment our NLP pipeline • "Reality-based Annotation" with Next??? current, real-world data Twitter? Language is NOT static Tumblr? Wordpres s
  • 8. Is a 20 year-old corpus good enough?
  • 9. How we do it • Amazon EC2-based deployment • Efficiency through constraint-based architecture • Small is Big! • Horizontal scaling by adding servers! • Yea, we can always go vertical • Blah, blah, more details!
  • 10. Micro Services • Services are stand-alone building blocks • Increase capacity through a "more like this" button
  • 11. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 12. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 13. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 14. Micro Services • Big application => micro services Monolithic application "Isn't this just SOA?"
  • 15. Not PO-SOA • This is different • No proprietary message bus • Decoupled objects • Dedicated storage*** • Speak REST • Develop your services in… • Java • Scala • Ruby • Php
  • 16. Speak REST? • Sounds good but… • REST semantics vary wildly • HATEOAS vs. practical REST? /api/pet.json/1?delete (GET) /api/pet.json/1 (DELETE) Al /api/pet.json/1 (POST empty) valid! So…
  • 17. Speak REST? • Sounds good but… • REST semantics vary wildly • HATEOAS vs. practical REST? /api/pet.json/1?delete (GET) Peer All /api/pet.json/1 (DELETE) Review! valid! /api/pet.json/1 (POST empty) Better Docs! So… API API Styleguide Council! !
  • 18. SOA makes new Challenges • It's communication (not easy) • Need a consumer & provider contract • Driving force to create Swagger
  • 19. What is Swagger? • Swagger is… • Spec for declaring and documenting an API • A framework for auto-generating the spec • A library for client library generation • A JSON-based test framework • It's open source! • http://swagger.wordnik.com
  • 20. How? • Swagger Codegen • Creates a client based on your Swagger Spec scala src/main/scala/Codegen.scala ${swagger-spec-url} Scal a Ruby
  • 21. In the Wordnik Workflow • Jenkins will… • Build a service library • Build a stand-alone application distro • Build an installable image (RPM) • Build a compatible client library • Consumers will… • Declare dependency on a service version • Use a client for that version • Be given a list of compatible services, by cluster, version
  • 22. Back to Data • Micro services have small(ish) databases • Share nothing across services • YES To replica sets • Deployed to ephemeral storage • (more in a bit) • Small by design • How to keep them small?
  • 23. Keeping Databases Small • Some easy tricks • Schema-less => "schema per document" • Keep field names short! db.foo.save({user_name:"Tony"}) Repeat db.foo.save({un:"Tony"})10e9 times! • Indexes • They can get *huge* • Make _id matter!
  • 24. Keeping Databases Small • Some easy tricks • Schema-less => "schema per document" • Keep field names short! db.foo.save({user_name:"Tony"}) Repeat db.foo.save({un:"Tony"})10e9 times! • Indexes • They can get *huge* • Make _id matter!
  • 25. Keeping Databases Small • Don't make _id just an "auto increment" You're stuck with it! Be smart • User collection? Try _id: username • Email collection? Try _id: email • Date-driven collection? How about _id: "20120502" • db.logins.find({_id:/^201205/}) 1 7 Be lazy until you can't anymore! 1 2 5 7
  • 26. Keeping Databases Small • DAO or die! • Fancy index scheme => control access to collections NO!!!! Yes
  • 27. Keeping Databases Small • If/when you need to shard… Don't make your clients do this!
  • 28. Keeping Databases Small • Again, why keep them small? • Starting a new replica • Initial sync • Index rebuilding • Backups • Index Compaction • Speed • TCO
  • 29. Keeping Databases Small • Again, why keep them small? Everythin • Starting a new replica g is • Initial sync easier • Index rebuilding • Backups • Index Compaction • Speed This can • TCO take DAYS
  • 30. Ephemeral Storage? • Every EC2 instance type has some (except micro) • Only available via EC2 API • Less prone to issues than EBS • Faster *** • Included in cost of server
  • 31. Ephemeral Storage? • Every EC2 instance type has some (except micro) • Only available via EC2 API • Less prone to issues than EBS • Faster *** • Included in cost of server But dies on host reboot!
  • 33. Which Zone? Which Region?
  • 34. Which Zone? Which Region? Arbiter handles external connectivity issue detection
  • 35. How does this really stack up? • Tuned indexes & access, split with services • Was: 3 DAS Devices w/18 TB disk • Now: 21 M1.large + M1.xlarge instances • 3 Zones, 2 regions • The Gory Details blog.wordnik.com/with-software-small-is-the-new-big
  • 36. As for Services • ~1,000 requests/sec via Swagger-enabled micro services • Direct to Consumer via SwaggerSocket
  • 37. What's Next • Migrating all services to SwaggerSocket • OSS WebSocket subprotocol https://github.com/wordnik/swaggersocket • 25%-100% speed increase (sync & async) • Discovery via Wordnik

Editor's Notes

  • #39: list.foldLeft(0)(x, y => x+y)
  • #40: list.foldLeft(0)(x, y => x+y)
  • #41: list.foldLeft(0)(x, y => x+y)
  • #42: list.foldLeft(0)(x, y => x+y)
  • #43: list.foldLeft(0)(x, y => x+y)
  • #44: list.foldLeft(0)(x, y => x+y)
  • #45: list.foldLeft(0)(x, y => x+y)