Planning LAMP infrastructure

Designing, Scoping, and Conﬁguring
Scalable LAMP
Infrastructure

Presented 2010-05-19 by David Strauss

Wed 2010-06-09

About me
‣ Founded Four Kitchens in 2006 while at UT Austin

Wed 2010-06-09

About me
‣ In 2008, launched Pressﬂow,
which now powers the largest Drupal sites

Wed 2010-06-09

About me
‣ Worked with some of the largest sites in the world:
Lifetime Digital, Mansueto Ventures, Wikipedia, The
Internet Archive, and The Economist

Wed 2010-06-09

About me
‣ Engineered the LAMP stack, deployment tools, and
management tools for Yale University, multiple NBC-
Universal properties, and Drupal.org

Wed 2010-06-09

About me
‣ Engineered development workﬂows for Examiner.com

Wed 2010-06-09

About me
‣ Engineered development workﬂows for Examiner.com
‣ Contributor to Drupal, Bazaar, Ubuntu, BCFG2,
Varnish, and other open-source projects

Wed 2010-06-09

Some assumptions

David Strauss

Wed 2010-06-09

Some assumptions
‣ You have more than one web server

David Strauss

Wed 2010-06-09

Some assumptions
‣ You have root access

David Strauss

Wed 2010-06-09

Some assumptions
‣ You deploy to Linux
(though PHP on Windows is more sane than ever)

David Strauss

Wed 2010-06-09

Some assumptions
‣ Database and web servers occupy separate boxes

David Strauss

Wed 2010-06-09

Some assumptions
‣ Database and web servers occupy separate boxes
‣ Your application behaves more or less
like Drupal, WordPress, or MediaWiki

David Strauss

Wed 2010-06-09

Understanding
Load Distribution

David Strauss

Wed 2010-06-09

Predicting peak traffic
Traffic over the day can be highly irregular. To plan
for peak loads, design as if all traffic were as heavy
as the peak hour of load in a typical month — and
then plan for some growth.

David Strauss

Wed 2010-06-09

Analyzing hit distribution

David Strauss

Wed 2010-06-09


100%

David Strauss

Wed 2010-06-09


nt
n te
Co
tic
Sta

100%

David Strauss

Wed 2010-06-09

30%
nt
n te
Co
tic
Sta

100%

David Strauss

Wed 2010-06-09

30%
nt
n te
Co
tic
Sta

100%
Dy Pag
na es
m
ic

David Strauss

Wed 2010-06-09

30%
nt
n te
Co
tic
Sta

100%
Dy Pag
na es
m
ic

70%

David Strauss

Wed 2010-06-09

30%
nt
n te
Co
tic
Sta

100%
Dy Pag
na es
m
ic

70%
Auth
enticat
ed

David Strauss

Wed 2010-06-09

30%
nt
n te
Co
tic
Sta

100%
Dy Pag
na es
m
ic

70%
Auth
enticat
ed
20%
David Strauss

Wed 2010-06-09

30%
nt
n te
Co
tic
Sta

100%
s
ou
m
ny
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

30%
nt
n te
Co 50%
tic
Sta

100%
s
ou
m
ny
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

30% an
H um
nt
n te
Co 50%
tic
Sta

100%
s
ou
m
ny
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50%
tic
Sta

100%
s
ou
m
ny
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50%
tic
Sta

W wl
C

eb er
ra
100%
s
ou
m
ny
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50%
tic
Sta

W wl
C

eb er
ra
100%
s
ou
m
ny

10%
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50%
tic
Sta e nt
m
at

W wl
Tre

C

eb er
ra
l
100% cia

s
ou o Sp
e
m
N
ny

10%
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50% 3%
tic
Sta e nt
m
at

W wl
Tre

C

eb er
ra
l
100% cia

s
ou o Sp
e
m
N
ny

10%
Dy Pag

no
na es

A
m
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50% 3%
tic
Sta e nt
m
at

W wl
Tre

C

eb er
ra
l
100% cia

s
ou o Sp
e
m
N
ny

10%
Dy Pag

no

“Pay
na es

W
A

Byp all”
m

ass
ic

70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

40%
30% an
H um
nt
n te
Co 50% 3%
tic
Sta e nt
m
at

W wl
Tre

C

eb er
ra
l
100% cia

s
ou o Sp
e
m
N
ny

10%
Dy Pag

no

“Pay
na es

W
A

Byp all”
m

ass
ic

7%
70%
Auth
en ticat
ed
20%
David Strauss

Wed 2010-06-09

Throughput vs. Delivery Methods
Yellow
Green Red
(Dynamic,
(Static) (Dynamic)
Cacheable)
2
Content Delivery
Network
●●●●●●●●●● ✖ ✖
Reverse Proxy
Cache
●●●●●●●● ●●●●●●● ✖
5000 req/s

1
PHP + APC +
●●●● ●●● ●●●
memcached

1
PHP + APC ●●●● ●● ●●

1
PHP (No APC) ●●●● ● ●
10 req/s

1 Delivered by Apache without PHP
More dots = More throughput 2 Some actually can do this.
David Strauss

Wed 2010-06-09

Objective

Deliver hits using the
fastest, most scalable
method available

David Strauss

Wed 2010-06-09

Layering: Less Traﬃc at Each Step

David Strauss

Wed 2010-06-09


Traﬃc

David Strauss

Wed 2010-06-09


Traﬃc

CDN

David Strauss

Wed 2010-06-09


Your Datacenter

Traﬃc

CDN

David Strauss

Wed 2010-06-09


Your Datacenter

Traﬃc

DNS Round Robin

CDN

David Strauss

Wed 2010-06-09


Your Datacenter

Load
Traﬃc Balancer

DNS Round Robin

CDN

David Strauss

Wed 2010-06-09


Your Datacenter

Load Reverse
Traﬃc Proxy
Balancer Cache

DNS Round Robin

CDN

David Strauss

Wed 2010-06-09


Your Datacenter

Load Reverse
Traﬃc Proxy Application
Balancer Cache Server

DNS Round Robin

CDN

David Strauss

Wed 2010-06-09


Your Datacenter

Load Reverse
Traﬃc Proxy Application
Balancer Cache Server

DNS Round Robin

CDN Database

David Strauss

Wed 2010-06-09

Oﬄoad from the master database
Your master database is the
single greatest limitation on
scalability.

David Strauss

Wed 2010-06-09

scalability.

Application
Server

Master
Database

David Strauss

Wed 2010-06-09

scalability.

Application
Server

Master
Memory
Cache
Database

David Strauss

Wed 2010-06-09

scalability.

Application Slave
Server Database

Master
Memory
Cache
Database

David Strauss

Wed 2010-06-09

Search Your master database is the
scalability.

Application Slave
Server Database

Master
Memory
Cache
Database

David Strauss

Wed 2010-06-09

Tools to use

David Strauss

Wed 2010-06-09

Tools to use
‣ Apache Solr or Sphinx for search
‣ Solr can be fronted with Varnish or another
proxy cache if queries are repetitive.

David Strauss

Wed 2010-06-09

Tools to use
‣ Varnish, nginx, Squid, or Traﬃc Server
for reverse proxy caching

David Strauss

Wed 2010-06-09

Tools to use
‣ Varnish, nginx, Squid, or Traﬃc Server
for reverse proxy caching
‣ Any third-party service for CDN

David Strauss

Wed 2010-06-09

Do the math
‣ All non-CDN traﬃc travels through your load
balancers and reverse proxy caches. Even traﬃc
passed through to application servers must run
through the initial layers.

David Strauss

Wed 2010-06-09

Do the math

Internal
Traﬃc

David Strauss

Wed 2010-06-09

Do the math

Internal Load
Traﬃc Balancer

David Strauss

Wed 2010-06-09

Do the math

Internal Load Reverse
Proxy
Traﬃc Balancer Cache

David Strauss

Wed 2010-06-09

Do the math

Application
Proxy
Traﬃc Balancer Cache Server

David Strauss

Wed 2010-06-09

Do the math

Application
Proxy
Traﬃc Balancer Cache Server

What hit rate is each layer getting?
How many servers share the load?

David Strauss

Wed 2010-06-09

Get a management/monitoring box

David Strauss

Wed 2010-06-09


Management

David Strauss

Wed 2010-06-09


Application
Management
Server

David Strauss

Wed 2010-06-09


Application
Management
Server

Reverse
Proxy
Cache

David Strauss

Wed 2010-06-09


Application
Database Management
Server

Reverse
Proxy
Cache

David Strauss

Wed 2010-06-09

Load
Balancer

Application
Database Management
Server

Reverse
Proxy
Cache

David Strauss

Wed 2010-06-09

Load (maybe even two
Balancer and have them
specialize or be
redundant)

Application
Database Management
Server

Reverse
Proxy
Cache

David Strauss

Wed 2010-06-09

Planning + Scoping

David Strauss

Wed 2010-06-09

Infrastructure goals

David Strauss

Wed 2010-06-09

‣ Redundancy: tolerate failure

David Strauss

Wed 2010-06-09

‣ Scalability: engage more users

David Strauss

Wed 2010-06-09

‣ Performance: ensure each user’s experience is fast

David Strauss

Wed 2010-06-09

‣ Performance: ensure each user’s experience is fast
‣ Manageability: stay sane in the process

David Strauss

Wed 2010-06-09

Redundancy

David Strauss

Wed 2010-06-09

Redundancy
‣ When one server fails, the website should
be able to recover without taking too long.

David Strauss

Wed 2010-06-09

Redundancy
‣ This requires at least N+1, putting a ﬂoor
on system requirements even for small sites.

David Strauss

Wed 2010-06-09

Redundancy
‣ How long can your site be down?

David Strauss

Wed 2010-06-09

Redundancy
‣ Automatic versus manual failover

David Strauss

Wed 2010-06-09

Redundancy
‣ Automatic versus manual failover
‣ Warning: over-automation can reduce uptime

David Strauss

Wed 2010-06-09

Performance

David Strauss

Wed 2010-06-09

Performance
‣ Find the “sweet spot” for hardware. This is the
best price/performance point.

David Strauss

Wed 2010-06-09

Performance
‣ Avoid overspending on any type of component

David Strauss

Wed 2010-06-09

Performance
‣ Yet, avoid creating bottlenecks

David Strauss

Wed 2010-06-09

Performance
‣ Swapping memory to disk is very dangerous

David Strauss

Wed 2010-06-09

Performance
‣ Swapping memory to disk is very dangerous
‣ Don’t skimp on RAM

David Strauss

Wed 2010-06-09

Relative importance
Processors/Cores Memory Disk Speed

Reverse Proxy
Cache ●● ●●● ●●

Web Server ●●●●● ●● ●

Database Server ●●● ●●●● ●●●●

Monitoring ● ● ●

David Strauss

Wed 2010-06-09

All of your servers

David Strauss

Wed 2010-06-09

All of your servers
‣ 64-bit: no excuse to use anything less in 2010

David Strauss

Wed 2010-06-09

All of your servers
‣ RHEL/CentOS and Ubuntu have the broadest
adoption for large-scale LAMP

David Strauss

Wed 2010-06-09

All of your servers
‣ But pick one, and stick with it for development,
staging, and production

David Strauss

Wed 2010-06-09

All of your servers
‣ But pick one, and stick with it for development,
staging, and production
‣ Some disk redundancy: rebuilding a server
is time-consuming unless you’re very automated

David Strauss

Wed 2010-06-09

Reverse proxy caches

David Strauss

Wed 2010-06-09

‣ Varnish and nginx have modern architecture and
broad adoption
‣ Sites often front Varnish with nginx
for gzip and/or SSL

David Strauss

Wed 2010-06-09

broad adoption
for gzip and/or SSL
‣ Squid and Traﬃc Server are clunky
but reliable alternatives

David Strauss

Wed 2010-06-09

broad adoption
for gzip and/or SSL
CPU

Save Your
Money

David Strauss

Wed 2010-06-09

broad adoption
for gzip and/or SSL
CPU Memory

Save Your
Money + 1 GB base system
+ 3 GB for caching

David Strauss

Wed 2010-06-09

broad adoption
for gzip and/or SSL
CPU Memory Disk

Save Your
+ 3 GB for caching + Slow
+ Small
+ Redundant

David Strauss

Wed 2010-06-09

broad adoption
for gzip and/or SSL
CPU Memory Disk

Save Your
+ 3 GB for caching + Slow
+ Small
+ Redundant

David Strauss
= 5000 req/s
Wed 2010-06-09

Web servers

David Strauss

Wed 2010-06-09

Web servers
‣ Apache 2.2 + mod_php + memcached

David Strauss

Wed 2010-06-09

Web servers
‣ FastCGI is a bad idea
‣ Memory improvements are redundant w/ Varnish
‣ Higher latency + less eﬃcient with APC opcode

David Strauss

Wed 2010-06-09

Web servers
‣
Memory improvements are redundant w/ Varnish
‣ Check the memory your app takes per process

David Strauss

Wed 2010-06-09

Web servers
‣
‣ Tune MaxClients to around 25 × cores

David Strauss

Wed 2010-06-09

Web servers
‣
CPU
Max out
cores
(but prefer fast
cores to density)

David Strauss

Wed 2010-06-09

Web servers
‣
CPU Memory
Max out
cores
(but prefer fast
cores to density)
+ 1 GB base system
+ 1 GB memcached
+ 25 × cores × per-
process app memory

David Strauss

Wed 2010-06-09

Web servers
‣
CPU Memory Disk
Max out
cores
(but prefer fast
cores to density)
+ 1 GB base system
+ 1 GB memcached
process app memory
+ Slow
+ Small
+ Redundant

David Strauss

Wed 2010-06-09

Web servers
‣
CPU Memory Disk
Max out
cores
(but prefer fast
cores to density)
+ 1 GB base system
+ 1 GB memcached
process app memory
+ Slow
+ Small
+ Redundant

David Strauss = 100 req/s
Wed 2010-06-09

Database servers

David Strauss

Wed 2010-06-09

Database servers
‣ Insist on MySQL 5.1+ and InnoDB

David Strauss

Wed 2010-06-09

Database servers
‣ Consider Percona builds and (eventually) MariaDB

David Strauss

Wed 2010-06-09

Database servers
‣ Every Apache process generally needs at least one
connection available, and leave some headroom

David Strauss

Wed 2010-06-09

Database servers
‣ Tune the InnoDB buﬀer pool to at least half of RAM

David Strauss

Wed 2010-06-09

Database servers
CPU
No more
than 8-12
cores

David Strauss

Wed 2010-06-09

Database servers
CPU Memory
No more
than 8-12
cores
+ As much as you can
aﬀord (even RAM not
used by MySQL caches
disk content)

David Strauss

Wed 2010-06-09

Database servers
CPU Memory Disk
No more
than 8-12
cores
disk content)
+ Fast
+ Large
+ Redundant

David Strauss

Wed 2010-06-09

Database servers
CPU Memory Disk
No more
than 8-12
cores
disk content)
+ Fast
+ Large
+ Redundant

David Strauss = 3000 queries/s
Wed 2010-06-09

Management server

David Strauss

Wed 2010-06-09

Management server
‣ Nagios: service outage monitoring

David Strauss

Wed 2010-06-09

Management server
‣ Cacti: trend monitoring

David Strauss

Wed 2010-06-09

Management server
‣ Hudson: builds, deployment, and automation

David Strauss

Wed 2010-06-09

Management server
‣ Yum/Apt repo: cluster package distribution

David Strauss

Wed 2010-06-09

Management server
‣ Puppet/BCFG2/Chef: conﬁguration management

David Strauss

Wed 2010-06-09

Management server
CPU

Save Your
Money

David Strauss

Wed 2010-06-09

Management server
CPU Memory

Save Your
Money + Save Your Money

David Strauss

Wed 2010-06-09

Management server
CPU Memory Disk

Save Your
+ Slow
+ Large
+ Redundant

David Strauss

Wed 2010-06-09

Management server
CPU Memory Disk

Save Your
+ Slow
+ Large
+ Redundant

= good enough
David Strauss

Wed 2010-06-09

Assembling the numbers

David Strauss

Wed 2010-06-09

‣ Start with an architecture providing redundancy.
‣ Two servers, each running the whole stack

David Strauss

Wed 2010-06-09

‣ Increase the number of proxy caches based on
anonymous and search engine traﬃc.

David Strauss

Wed 2010-06-09

‣ Increase the number of web servers based on
authenticated traﬃc.

David Strauss

Wed 2010-06-09

‣ Increase the number of web servers based on
authenticated traﬃc.
‣ Databases are harder to predict, but large sites
should run them on at least two separate boxes
with replication.
David Strauss

Wed 2010-06-09

Extreme measures
for performance
and scalability

David Strauss

Wed 2010-06-09

When caching and search
oﬄoading isn’t enough

David Strauss

Wed 2010-06-09

‣ Some sites have intense custom page needs
‣ High proportion of authenticated users
‣ Lots of targeted content for anonymous users

David Strauss

Wed 2010-06-09

‣ Too much data to process real-time on an RDBMS

David Strauss

Wed 2010-06-09

‣ Too much data to process real-time on an RDBMS
‣ Data is so volatile that maintaing standard caches
outweighs the overhead of regeneration

David Strauss

Wed 2010-06-09

Non-relational/NoSQL tools

David Strauss

Wed 2010-06-09

‣ Most web applications can run well
on less-than-ACID persistence engines

David Strauss

Wed 2010-06-09

‣ In some cases, like MongoDB, easier to use than
SQL in addition to being higher performance

David Strauss

Wed 2010-06-09

‣ Interested? You’ve already missed the tutorial.

David Strauss

Wed 2010-06-09

‣ In other cases, like Cassandra, considerably harder
to use than SQL but massively scalable

David Strauss

Wed 2010-06-09

‣ Current Erlang-based systems are neat but slow

David Strauss

Wed 2010-06-09

‣ Current Erlang-based systems are neat but slow
‣ Many require a special PHP extension,
at least for ideal performance
David Strauss

Wed 2010-06-09

Oﬄine processing

David Strauss

Wed 2010-06-09

Oﬄine processing
‣ Gearman
‣ Primarily asynchronous job manager

David Strauss

Wed 2010-06-09

Oﬄine processing
‣ Gearman
‣ Hadoop
‣ MapReduce framework

David Strauss

Wed 2010-06-09

Oﬄine processing
‣ Gearman
‣ Hadoop
‣ MapReduce framework
‣ Traditional message queues
‣ ActiveMQ + Stomp is easy from PHP
‣ Allows you to build your own job manager

David Strauss

Wed 2010-06-09

Edge-side includes

Wed 2010-06-09

Edge-side includes

ESI Processor
(Varnish, Akamai, other)

Wed 2010-06-09

Edge-side includes
<html>
<body>
<esi:include href=“http://drupal.org/block/views/3” />
</body>
</html>

ESI Processor
(Varnish, Akamai, other)

Wed 2010-06-09

Edge-side includes
<html>
<body>
</body>
</html>

<div>
ESI Processor My block HTML.
(Varnish, Akamai, other) </div>

Wed 2010-06-09

Edge-side includes
<html>
<body>
</body>
</html>

<div>

<html>
<body>
<div>
My block HTML.
</div>
</body>
</html>

Wed 2010-06-09

Edge-side includes
<html>
<body>
‣ Blocks of HTML are
</body>
integrated into the
</html>
page at the edge
layer.
<div>

<html>
<body>
<div>
My block HTML.
</div>
</body>
</html>

Wed 2010-06-09

Edge-side includes
<html>
<body>
</body>
integrated into the
</html>
page at the edge
layer.
<div>
ESI Processor My block HTML. ‣ Non-primary page
content often
occupies >50% of
<html> PHP execution time.
<body>
<div>
My block HTML.
</div>
</body>
</html>

Wed 2010-06-09

Edge-side includes
<html>
<body>
</body>
integrated into the
</html>
page at the edge
layer.
<div>
ESI Processor My block HTML. ‣ Non-primary page
content often
occupies >50% of
<html> PHP execution time.
<body>
<div>
My block HTML. ‣ Decouples block
</div>
</body> and page cache
</html>
lifetimes

Wed 2010-06-09

HipHop PHP

David Strauss

Wed 2010-06-09

HipHop PHP
‣ Compiles PHP to a C++-based binary
‣ Integrated HTTP server

David Strauss

Wed 2010-06-09

HipHop PHP
‣ Supports a subset of PHP and extensions

David Strauss

Wed 2010-06-09

HipHop PHP
‣ Requires an organizational commitment to
building, testing, and deploying on HipHop

David Strauss

Wed 2010-06-09

HipHop PHP
‣ Requires an organizational commitment to
building, testing, and deploying on HipHop
‣ Scott MacVicar has a presentation on HipHop later
today at 16:00.

David Strauss

Wed 2010-06-09

Cluster Problems

Credits

Wed 2010-06-09

Server failure

David Strauss

Wed 2010-06-09

Server failure
‣ Load balancers can remove broken or overloaded
application reverse proxy caches.

David Strauss

Wed 2010-06-09

Server failure
‣ Reverse proxy caches like Varnish can
automatically use only functional application
servers.

David Strauss

Wed 2010-06-09

Server failure
servers.
‣ Memcached clients automatically handle failure.

David Strauss

Wed 2010-06-09

Server failure
servers.
‣ Virtual service IP management tools like
heartbeat2 can manage which MySQL servers
receive connections to automate failover.

David Strauss

Wed 2010-06-09

Server failure
servers.
‣ Virtual service IP management tools like
heartbeat2 can manage which MySQL servers
receive connections to automate failover.
‣ Conclusion: Each layer intelligently monitors and
uses the servers beneath it.
David Strauss

Wed 2010-06-09

Cluster coherency

David Strauss

Wed 2010-06-09

Cluster coherency
‣ Systems that run properly on single boxes may
lose coherency when run on a networked cluster.

David Strauss

Wed 2010-06-09

Cluster coherency
‣ Some caches, like APC’s object cache, have no
ability to handle network-level coherency. (APC’s
opcode cache is safe to use on clusters, though.)

David Strauss

Wed 2010-06-09

Cluster coherency
‣ memcached, if misconfigured, can hash values
inconsistently across the cluster, resulting in
different servers using different memcached
instances for the same keys.

David Strauss

Wed 2010-06-09

Cluster coherency
‣ memcached, if misconfigured, can hash values
inconsistently across the cluster, resulting in
different servers using different memcached
instances for the same keys.
‣ Session coherency issues can be helped with load
balancer affinity or storage in memcached
David Strauss

Wed 2010-06-09

Cache regeneration races

David Strauss

Wed 2010-06-09

‣ Downside to network cache coherency:
synched expiration

David Strauss

Wed 2010-06-09

synched expiration
‣ Requires a locking framework (like ZooKeeper)

David Strauss

Wed 2010-06-09

synched expiration

Old Cached Item

David Strauss

Wed 2010-06-09

synched expiration

Old Cached Item

Time

David Strauss

Wed 2010-06-09

synched expiration

Old Cached Item

Expiration
Time

David Strauss

Wed 2010-06-09

synched expiration
All servers regenerating the item.

Old Cached Item

Expiration
{
Time

David Strauss

Wed 2010-06-09

synched expiration
All servers regenerating the item.

Old Cached Item

Expiration
{ New Cached Item

Time

David Strauss

Wed 2010-06-09

Broken replication

David Strauss

Wed 2010-06-09

Broken replication
‣ MySQL slave servers get out of synch, fall further
behind

David Strauss

Wed 2010-06-09

Broken replication
behind
‣ No (sane) method of automated recovery

David Strauss

Wed 2010-06-09

Broken replication
behind
‣ Only solvable with good monitoring and recovery
procedures

David Strauss

Wed 2010-06-09

Broken replication
behind
‣ Only solvable with good monitoring and recovery
procedures
‣ Can automate DB slave blacklisting from use,
but requires cluster management tools

David Strauss

Wed 2010-06-09

All content in this presentation, except where noted otherwise, is Creative Commons Attribution-
ShareAlike 3.0 licensed and copyright 2009 Four Kitchen Studios, LLC.

Wed 2010-06-09

DrupalCamp Stockholm
Presentation Ended Here

David Strauss

Wed 2010-06-09

Managing the Cluster

Credits

Wed 2010-06-09

The problem
Software and
Conﬁguration

Applicati Applicati Applicati Applicati Applicati
on Server on Server on Server on Server on Server

Objectives:
Fast, atomic deployment and rollback
Minimize single points of failure and contention
Restart services
Integrate with version control systems
Credits

Wed 2010-06-09

Manual updates and deployment

Human Human Human Human Human


Why not: slow deployment,
non-atomic/diﬃcult rollbacks

Credits

Wed 2010-06-09

Shared storage

NFS

Why not: single point of contention and failure

Credits

Wed 2010-06-09

rsync
Synchronized
with rsync


Why not: non-atomic, does not manage services

Credits

Wed 2010-06-09

Capistrano
Deployed with
Capistrano


Capistrano provides near-atomic deployment,
service restarts, automated rollback, test automation,
and version control integration (tagged releases).

Credits

Wed 2010-06-09

Multistage deployment
Deployments
Deployed with Deployed with
Capistrano can be staged. Capistrano
cap staging deploy
cap production deploy

Development
Integration Deployed with Staging
Capistrano


Credits

Wed 2010-06-09

But your application isn’t the only
thing to manage.

Credits

Wed 2010-06-09

Beneath the application
Reverse
Cluster-level
Proxy configuration
Database
Cache


Cluster management applies to package
management, updates, and software
configuration.
cfengine and bcfg2 are popular
cluster-level system configuration tools.
Credits

Wed 2010-06-09

System configuration management
‣ Deploys and updates packages, cluster-wide or
selectively.
‣ Manages arbitrary text configuration files
‣ Analyzes inconsistent configurations (and
converges them)
‣ Manages device classes (app. servers, database
servers, etc.)
‣ Allows confident configuration testing on a
staging server.

Credits

Wed 2010-06-09

All on the management box

{
Developme
nt
Integration

Staging

Manageme
nt
Deploymen
t Tools

Monitoring

Credits

Wed 2010-06-09

Monitoring

Credits

Wed 2010-06-09

Types of monitoring
Failure Capacity/Load

Analyzing Analyzing Trends
Downtime
Predicting Load
Viewing Failover
Checking Results
Troubleshooting of Conﬁguration
and Software
Notiﬁcation Changes

David Strauss

Wed 2010-06-09

Everyone needs both.

Credits

Wed 2010-06-09

What to use

Failure/Uptime Capacity/Load

Nagios Cacti

Hyperic Munin

David Strauss

Wed 2010-06-09

Nagios
‣ Highly recommended.
‣ Used by Four Kitchens and Tag1 Consulting for
client work, Drupal.org, Wikipedia, etc.
‣ Easy to install on CentOS 5 using EPEL packages.
‣ Easy to install nrpe agents to monitor diverse
services.
‣ Can notify administrators on failure.
‣ We use this on Drupal.org

David Strauss

Wed 2010-06-09

Cacti
‣ Highly annoying to set up.
‣ One instance generally collects all statistics.
(No “agents” on the systems being monitored.)
‣ Provides ﬂexible graphs that can be customized
on demand.

Credits

Wed 2010-06-09

Munin
‣ Fairly easy to set up.
‣ One instance generally collects all statistics.
(No “agents” on the systems being monitored.)
‣ Provides static graphs that cannot be
customized.

Credits

Wed 2010-06-09

Pressﬂow
Make Drupal sites scale by upgrading core
with a compatible, powerful replacement.

David Strauss

Wed 2010-06-09

Common large-site issues
‣ Drupal core requires patching to effectively
support the advanced scalability techniques
discussed here.
‣ Patches often conflict and have to be reapplied
with each Drupal upgrade.
‣ The original patches are often unmaintained.
‣ Sites stagnate, running old, insecure versions of
Drupal core because updating is too difficult.

David Strauss

Wed 2010-06-09

What is Pressflow?
‣ Pressflow is a derivative of Drupal core that
integrates the most popular performance and
scalability enhancements.
‣ Pressflow is completely compatible with existing
Drupal 5 and 6 modules, both standard and
custom.
‣ Pressflow installs as a drop-in replacement for
standard Drupal.
‣ Pressflow is free as long as the matching version of
Drupal is also supported by the community.
David Strauss

Wed 2010-06-09

What are the enhancements?
‣ Reverse proxy support
‣ Database replication support
‣ Lower database and session management load
‣ More eﬃcient queries
‣ Testing and optimization by Four Kitchens
with standard high-performance software
and hardware conﬁguration
‣ Industry-leading scalability support
by Four Kitchens and Tag1 Consulting
David Strauss

Wed 2010-06-09

Four Kitchens + Tag1
‣ Provide the development, support, scalability, and
performance services behind Pressﬂow
‣ Comprise most members of the Drupal.org
infrastructure team
‣ Have the most experience scaling Drupal sites
of all sizes and all types

David Strauss

Wed 2010-06-09

Ready to scale?
‣ Learn more about Pressﬂow:
‣ Pick up pamphlets in the lobby
‣ Request Pressﬂow releases at fourkitchens.com
‣ Get the help you need to make it happen:
‣ Talk to me (David) or Todd here at DrupalCamp
‣ Email shout@fourkitchens.com

David Strauss

Wed 2010-06-09

Planning LAMP infrastructure

More Related Content

More from David Timothy Strauss (10)

Recently uploaded (20)

Planning LAMP infrastructure