SlideShare a Scribd company logo
Open Government
Data & MongoDB
        Luigi Montanez
 luigi@sunlightfoundation.com
Open Government Data and MongoDB
Question? @LuigiMontanez
Open Data + Open Source
  = Open Government



                Question? @LuigiMontanez
MongoDB enables
   open data

          Question? @LuigiMontanez
Opening Up Data
✴   Gather data from disparate sources
     ✴   Data dumps (SQL, Fixed-width columns)
     ✴   Web scraping
     ✴   Text/PDF parsing
✴   Serving RESTful JSON APIs



                                  Question? @LuigiMontanez
JSON
✴   Tree structure, not tabular
✴   Still relational
✴   JSON for data, XML for documents
✴   Closely resembles native data structures
✴   No manual parsing needed



                                  Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
App design
    drives
schema design

          Question? @LuigiMontanez
Open Government Data and MongoDB
Open Government Data and MongoDB
Open Government Data and MongoDB
Open Government Data and MongoDB
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com"
}




                            Text
Open Government Data and MongoDB
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
}
                            Text
Open Government Data and MongoDB
Open Government Data and MongoDB
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
  "entities": [...]         Text
}
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
  "entities": [
      {
                            Text
      "name": "Barack Obama",
      "type": "politician",
      },
        ...
  ]
}
Open Government Data and MongoDB
{
  "title": "President Obama's climate 'Plan B' in hot water -
Darren Samuelsohn - POLITICO.com",
  "slug": "EOsc",
  "source_url": "http://www.politico.com/news/stories/
  0810/40534.html",
  "content": ".................",
  "entities": [
      {

                            Text
      "name": "Barack Obama",
      "type": "politician",
      "breakdown": {"indiv": "33", "pac": "67"}
      "top_industries": ["Lawyers/Lobbyists","Finance/Insurance/
      Real Estate","Misc. Business"]
      },
        ...
  ]
}
Open Government Data and MongoDB
Natural Schemas


           Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Real-Time Congress API




                 Credit: vgm8383 on Flickr
Android App: “Congress”
Politiwidgets
Requirements
✴   Aggregate lots of data
      Biographical, Bills, Votes, Earmarks,
      Video Clips, Floor Updates, Legislative
      Documents, Committee Schedules,
      Contributions, Interest Group Ratings
✴   Lightweight responses



                                  Question? @LuigiMontanez
{legislator: {
    in_office: true,
    title: "Rep",
    nickname: "",
    district: "9",
    bioguide_id: "L000551",
    govtrack_id: "400237",
    phone: "202-225-2661",
    website: "http://lee.house.gov/index.html",
    twitter_id: "",
    last_name: "Lee",
    name_suffix: "",
    last_updated: "2010/04/13 00:00:14 +0000",
    party: "D",
    chamber: "house",
    state: "CA",
    youtube_url: "http://www.youtube.com/RepLee",
    first_name: "Barbara",
    gender: "F",
    congress_office: "2444 Rayburn House Office Building",
    earmarks: {
          average_number: 20,
          total_amount: 10000000,
          average_amount: 22994535,
          total_number: 28,
          last_updated: "2010-03-18",
          fiscal_year: 2010,
    }
    ...
}
// limit selection to a subset of fields
db.people.find( { 'first_name' : 'john' },
                { 'last_name' : 1,
                  'address' : 1 } );

// use dot-notation to dig into an object
db.people.find( { 'state': 'CA' },
                { 'address.zip_code': 1 } );
?sections=last_name,first_name,state,earmarks

  {legislator: {
      last_name: "Lee",
      first_name: "Barbara",
      state: "CA",
      earmarks: {
            average_number: 20,
            total_amount: 10000000,
            average_amount: 22994535,
            total_number: 28,
            last_updated: "2010-03-18",
            fiscal_year: 2010,
      }
  }
?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number




         {legislator: {
             last_name: "Lee",
             first_name: "Barbara",
             state: "CA",
             earmarks: {
                   total_amount: 10000000,
                   total_number: 28
             }
         }
Partial responses
 make payloads
     smaller

            Question? @LuigiMontanez
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Open Government Data and MongoDB
50 States =
50 Formats

         Question? @LuigiMontanez
Schemalessness
allows for granular
      control

             Question? @LuigiMontanez
Custom Fields
✴   Traditional RDBMS
     ✴   Update the schema for new fields, run a
         migration, feel icky
     ✴   Create a custom_fields table
✴   MongoDB
     ✴   Just store it


                                   Question? @LuigiMontanez
Speaking JSON
   natively

         Question? @LuigiMontanez
Python
Source   Scraped JSON               PostgreSQL
                        Transform
Source   Scraped JSON   MongoDB
Three Projects
✴   Poligraft
✴   Real Time Congress API
✴   Open State Project




                             Question? @LuigiMontanez
Developer Happiness
Thanks!
sunlightlabs.com
@LuigiMontanez



                   Question? @LuigiMontanez

More Related Content

KEY
Civic Hacking @ MongoNYC
PPTX
Hummingbird & the entity revolution
PPTX
Rethinking Notes
PPT
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
PPT
Experiments in Data Portability
PDF
Google Hack
PDF
Ssl panoramio com_photo_123892171
PPTX
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...
Civic Hacking @ MongoNYC
Hummingbird & the entity revolution
Rethinking Notes
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Experiments in Data Portability
Google Hack
Ssl panoramio com_photo_123892171
Semantic Web, Knowledge Graph, and Other Changes to SERPS – A Google Semantic...

What's hot (20)

PPTX
Turning Data Into Narrative
PPTX
Footnotes
PDF
Finding data: advanced search operators
PDF
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
PPT
3 google hacking
PPTX
Hacking
PPT
Journalists and the Social Web 1
PDF
Ric Rodriguez - Search In 2020 - it's No Longer About Ranking
PDF
Rfl dfn search1
PDF
Fluentd meetup intro
PDF
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
PPTX
Beyond Google: Advanced Internet Search Tips and Tricks
PPT
Plv Hal History Day
PDF
The Analects of Confucius
PPTX
FSDN conversations
PDF
Google Cheat Sheet
PPT
Mpl brownbag sept2011
PDF
Beyond Google: Advanced Search
PPT
(Re-) Discovering Lost Web Pages
PPTX
Google Searchology
Turning Data Into Narrative
Footnotes
Finding data: advanced search operators
#smxlondon Everything You Need to Know About How GraphSearch Works in 15-ish ...
3 google hacking
Hacking
Journalists and the Social Web 1
Ric Rodriguez - Search In 2020 - it's No Longer About Ranking
Rfl dfn search1
Fluentd meetup intro
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
Beyond Google: Advanced Internet Search Tips and Tricks
Plv Hal History Day
The Analects of Confucius
FSDN conversations
Google Cheat Sheet
Mpl brownbag sept2011
Beyond Google: Advanced Search
(Re-) Discovering Lost Web Pages
Google Searchology
Ad

Similar to Open Government Data and MongoDB (20)

PDF
Sunlight Labs & MongoDB @ MongoDC
PDF
gRPC vs REST: let the battle begin!
PDF
Google Machine Learning APIs - puppies or muffins?
PPT
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
PDF
Civic Hacking @ Ruby Hoedown
PPTX
Harvesting Social Media (in BESOCIAL)
PDF
gRPC vs REST: let the battle begin!
PDF
BBC Linked Data Platform (SemTechBiz San Fran 2013)
KEY
MongoDB In Production At Sailthru
PPTX
Building Next-Generation Web APIs with JSON-LD and Hydra
PDF
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
PDF
R, HTTP, and APIs, with a preview of TopicWatchr
PPT
Coalmine spie 2012 presentation - jsw -d3
PPTX
A Real-World Implementation of Linked Data
PPTX
xAPI Camp-Four Lines of Code
PPT
Semantic Web Science
KEY
Mongo at Sailthru (MongoNYC 2011)
PPTX
Seo; Cutting Through The Noise
PDF
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
PDF
AMS, API, RAILS and a developer, a Love Story
Sunlight Labs & MongoDB @ MongoDC
gRPC vs REST: let the battle begin!
Google Machine Learning APIs - puppies or muffins?
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
Civic Hacking @ Ruby Hoedown
Harvesting Social Media (in BESOCIAL)
gRPC vs REST: let the battle begin!
BBC Linked Data Platform (SemTechBiz San Fran 2013)
MongoDB In Production At Sailthru
Building Next-Generation Web APIs with JSON-LD and Hydra
"R, HTTP, and APIs, with a preview of TopicWatchr" (15 November 2011)
R, HTTP, and APIs, with a preview of TopicWatchr
Coalmine spie 2012 presentation - jsw -d3
A Real-World Implementation of Linked Data
xAPI Camp-Four Lines of Code
Semantic Web Science
Mongo at Sailthru (MongoNYC 2011)
Seo; Cutting Through The Noise
"gRPC vs REST: let the battle begin!" GeeCON Krakow 2018 edition
AMS, API, RAILS and a developer, a Love Story
Ad

More from Luigi Montanez (13)

PDF
Search-Friendly Web Development at RubyNation
PDF
Civic Hacking @ Strange Loop 2010
PDF
Civic Hacking @ Strange Loop 2010
PDF
Civic Coding @ SunnyConf
PDF
Search-Friendly Web Development @ Ruby|Web Conference 2010
PDF
Search-Friendly Web Development @ Lone Star Ruby Conference 2010
PDF
Search-Friendly Web Development @ DC RUG - August 2010
KEY
Civic Hacking @ Ruby Midwest 2010
PDF
Civic Hacking @ Ignite RailsConf
PDF
Civic Hacking @ LA RubyConf 2010
KEY
Be A Civic Coder - DCRUG
KEY
Be A Civic Coder
Search-Friendly Web Development at RubyNation
Civic Hacking @ Strange Loop 2010
Civic Hacking @ Strange Loop 2010
Civic Coding @ SunnyConf
Search-Friendly Web Development @ Ruby|Web Conference 2010
Search-Friendly Web Development @ Lone Star Ruby Conference 2010
Search-Friendly Web Development @ DC RUG - August 2010
Civic Hacking @ Ruby Midwest 2010
Civic Hacking @ Ignite RailsConf
Civic Hacking @ LA RubyConf 2010
Be A Civic Coder - DCRUG
Be A Civic Coder

Open Government Data and MongoDB

  • 1. Open Government Data & MongoDB Luigi Montanez luigi@sunlightfoundation.com
  • 4. Open Data + Open Source = Open Government Question? @LuigiMontanez
  • 5. MongoDB enables open data Question? @LuigiMontanez
  • 6. Opening Up Data ✴ Gather data from disparate sources ✴ Data dumps (SQL, Fixed-width columns) ✴ Web scraping ✴ Text/PDF parsing ✴ Serving RESTful JSON APIs Question? @LuigiMontanez
  • 7. JSON ✴ Tree structure, not tabular ✴ Still relational ✴ JSON for data, XML for documents ✴ Closely resembles native data structures ✴ No manual parsing needed Question? @LuigiMontanez
  • 8. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 9. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 10. App design drives schema design Question? @LuigiMontanez
  • 15. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com" } Text
  • 17. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", } Text
  • 20. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [...] Text }
  • 21. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [ { Text "name": "Barack Obama", "type": "politician", }, ... ] }
  • 23. { "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [ { Text "name": "Barack Obama", "type": "politician", "breakdown": {"indiv": "33", "pac": "67"} "top_industries": ["Lawyers/Lobbyists","Finance/Insurance/ Real Estate","Misc. Business"] }, ... ] }
  • 25. Natural Schemas Question? @LuigiMontanez
  • 26. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 27. Real-Time Congress API Credit: vgm8383 on Flickr
  • 30. Requirements ✴ Aggregate lots of data Biographical, Bills, Votes, Earmarks, Video Clips, Floor Updates, Legislative Documents, Committee Schedules, Contributions, Interest Group Ratings ✴ Lightweight responses Question? @LuigiMontanez
  • 31. {legislator: { in_office: true, title: "Rep", nickname: "", district: "9", bioguide_id: "L000551", govtrack_id: "400237", phone: "202-225-2661", website: "http://lee.house.gov/index.html", twitter_id: "", last_name: "Lee", name_suffix: "", last_updated: "2010/04/13 00:00:14 +0000", party: "D", chamber: "house", state: "CA", youtube_url: "http://www.youtube.com/RepLee", first_name: "Barbara", gender: "F", congress_office: "2444 Rayburn House Office Building", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } ... }
  • 32. // limit selection to a subset of fields db.people.find( { 'first_name' : 'john' }, { 'last_name' : 1, 'address' : 1 } ); // use dot-notation to dig into an object db.people.find( { 'state': 'CA' }, { 'address.zip_code': 1 } );
  • 33. ?sections=last_name,first_name,state,earmarks {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } }
  • 34. ?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { total_amount: 10000000, total_number: 28 } }
  • 35. Partial responses make payloads smaller Question? @LuigiMontanez
  • 36. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 38. 50 States = 50 Formats Question? @LuigiMontanez
  • 39. Schemalessness allows for granular control Question? @LuigiMontanez
  • 40. Custom Fields ✴ Traditional RDBMS ✴ Update the schema for new fields, run a migration, feel icky ✴ Create a custom_fields table ✴ MongoDB ✴ Just store it Question? @LuigiMontanez
  • 41. Speaking JSON natively Question? @LuigiMontanez
  • 42. Python Source Scraped JSON PostgreSQL Transform
  • 43. Source Scraped JSON MongoDB
  • 44. Three Projects ✴ Poligraft ✴ Real Time Congress API ✴ Open State Project Question? @LuigiMontanez
  • 46. Thanks! sunlightlabs.com @LuigiMontanez Question? @LuigiMontanez