Mongo for aadhaar

Search data store for the world's largest
biometric identity system

Regunath Balasubramanian Shashikant Soni
regunathb@gmail.com soni.shashikant@gmail.com
twitter @regunathb

CONFIDENTIAL: For limited circulation only Slide 1

India
● 1.2 billion residents
● 640,000 villages, ~60% lives under $2/day
● ~75% literacy, <3% pays Income Tax, <20% banking
● ~800 million mobile, ~200-300 mn migrant workers

● Govt. spends about $25-40B on direct subsidies
● Residents have no standard identity document
● Most programs plagued with ghost and multiple identities causing
leakage of 30-40%

Slide 2

Aadhaar
● Create a common ‘national identity’ for every ‘resident’
●Biometric backed identity to eliminate duplicates
●‘Verifiable online identity’ for portability
● Applications ecosystem using open APIs
●Aadhaar enabled bank account and payment platform
●Aadhaar enabled electronic, paperless KYC (Know Your
Customer)

Slide 3

Search Requirements
● Multi-attribute query like:
name contains ‘regunath’ AND city = ‘bangalore’ AND
address contains ‘J P Nagar’ AND YearOfBirth = ……

● Search 1.2B resident data with photo, history
●35Kb - Average record size
● Response times in milliseconds
● Open scale out

Slide 4

Why MongoDB
● Auto-sharding
● Replication
● Failover
… Essentially an AP (slaveOk) data store in CAP parlance

● Evolving schema
● Map-Reduce for analysis
● Full text search
●Compound (or) multi-keys

Slide 5

Design

{ _id:123456789, name: ‘abcde’, year:1980, ….. }
MongoDB 2

Search API Client App
Name=‘abcde’
Solr 1
Address=‘some place’
Indexes Name: ‘abcde’ Year= 1980
Address: ‘some place’
year: 1980

● Read/Search
●Sharded Solr indexes for search
●Keyed document read from MongoDB
● Write
●Eventual consistency (across data sources) driven by
application
●Composite MongodDB-Solr app persistence handler Slide 6

Implementation and Deployment
● Start - 4M records in 2 shards
Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas)
● Performance , Reliability & Durability
●SlaveOk
●getLastError, Write Concern: availability vs durability
 j = journaling
 w = nodes-to-write
● Replica-sets / Shards – how?
RS 1 RS 1 RS 1
Rs 2 RS 2 RS 2

Primary
Config 1 Config 2 Config 3
Secondary

Arbiter Router Router Router
Slide 7

Monitoring and Troubleshooting
● Monitoring tools evaluated
●MMS
●munin
● Manual approach - daily ritual
●RS, DB, config, router - health and stats
● Problem analysis stats
●mongostat, iostat, currentOps, logs
●Client connections
● Stats for storage, shards addition
●Data file size
●Shard data distribution
●Replication
Slide 8

Key Learnings on MongoDB
● Indexing 32 fields
●Compound indexes
●Multi-keys indexes
 {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }
 db.coll.find ({ "indexes.email" : "john.doe@email.com" })
●Indexes use b-tree
●Many fields to index
●Performs well upto 1-2M documents
●Best if index fits in memory
● Data replication, RS failover
●Rollback when RS goes out of sync
 Manual restore (physical data copy)
 Restarting a very stale node
Slide 9

Questions?

Regunath Balasubramanian Shashikant Soni
regunathb@gmail.com soni.shashikant@gmail.com
twitter @regunathb

CONFIDENTIAL: For limited circulation only Slide 10

Mongo for aadhaar

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to Mongo for aadhaar (20)

More from MongoDB (20)

Mongo for aadhaar