Replication Online

Open source, high performance database

Replication

Summer 2012

1

Why Have Replication?

2

• High Availability (auto-failover)

• Read Scaling (extra copies to read from)

• Backups
– Online, Delayed Copy (fat finger)
– Point in Time (PiT) backups

• Use (hidden) replica for secondary workload
– Analytics
– Data-processing
– Integration with external systems 3

Planned
– Hardware upgrade
– O/S or file-system tuning
– Relocation of data to new file-system / storage
– Software upgrade

Unplanned
– Hardware failure
– Data center failure
– Region outage
– Human error
– Application corruption 4

• A cluster of N servers
• All writes to primary
• Reads can be to primary (default) or a
secondary
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
5

Member 1 Member 3

Member 2

• Replica Set is made up of 2 or more nodes

6

Member 1 Member 3

Member 2
Primary

• Election establishes the PRIMARY
• Data replication from PRIMARY to SECONDARY

7

negotiate new
master
Member 1 Member 3

Member 2
DOWN

• PRIMARY may fail
• Automatic election of new PRIMARY
if majority exists

8

negotiate new
master
Member 3
Member 1 Primary

Member 2
DOWN

• New PRIMARY elected
• Replica Set re-established

9

Member 3
Member 1
Primary

Member 2
Recovering

• Automatic recovery

10

Member 3
Member 1
Primary

Member 2

• Replica Set re-established

11

Understanding automatic failover

12

Primary As long as a partition
Secondary
can see a majority
(>50%) of the
Secondary cluster, then it will elect a
primary.

13

Primary 66% of cluster visible.
Secondary
Primary is elected

Failed
Node

14

Secondary 33% of cluster visible.
Failed
Read only mode.
Node

Failed
Node

15

Primary

Secondary

Secondary

16

Primary
66% of cluster visible
Secondary
Primary is elected
Primary

Failed
Secondary Node

Secondary

17

Primary

Secondary
Failed
Node

Failed
Node
Secondary

Secondary
Read only mode.
18

Primary

Secondary

Secondary

Secondary

19

Secondary
Primary
Secondary Read only mode.
Failed
Secondary Node

Failed
Node

Secondary

Secondary

20

Primary
Failed
Node
Secondary
Failed
Node

Secondary

Secondary 50% of cluster visible
Secondary Read only mode.
Secondary

21

Avoid single points of failure

22

Primary

Secondary

Secondary

Top of rack switch

Rack falls over

24

Primary

Secondary

Secondary

Loss of internet

Building burns dow

25

San Francisco

Primary

Secondary

Secondary

Dallas

26

San Francisco

Primary Priority 1

Secondary Priority 1

Secondary Priority 0
Dallas

Disaster recover data center. Will
never become primary
automatically.
27

San Francisco

New York Primary

Secondary

Dallas Secondary

28

Primary

Secondary
Is this a good idea?
Arbiter

30

1

Primary

Secondary

Arbiter

31

1 2

Primary Primary

Secondary Secondary

Arbiter Arbiter

32

1 2 3

Primary Primary Primary
Full Sync

Secondary Secondary Secondary Secondary

Arbiter Arbiter Arbiter

Uh oh. Full Sync is going to use
a lot of resources on the
primary. So I may have
downtime or degraded
performance
33

1

Primary

Secondary

Secondary

34

1 2

Primary Primary

Secondary Secondary

Secondary Secondary

35

1 2 3

Primary Primary Primary

Secondary Secondary Secondary Secondary

Secondary Secondary Secondary Full Sync

Sync can happen from
secondary, which will not impact
traffic on Primary.

36

• Avoid single points of failure
– Separate racks
– Separate data centers
• Avoid long recovery downtime
– Use journaling
– Use 3+ replicas
• Keep your actives close
– Use priority to control where failovers happen

37

Q&A after this session

38

Replication Online

More Related Content

Similar to Replication Online (20)

More from MongoDB (20)

Recently uploaded (20)

Replication Online

Editor's Notes