SlideShare a Scribd company logo
Uptime:
Building Resilient
Services in Go
Blake Caldwell
http://blakecaldwell.net
https://twitter.com/blakecaldwell
Fog Creek Software
Git & Hg Source Code Hosting
2014 Google I/O:
What's Go?
Solution Found!
*Now to find the problem
Kiln's SSH
Reverse Proxy
2015-GopherCon-Talk-Uptime.pdf
Why Rewrite?
Tons of Concurrency
1. Accepts SSH connection (1 goroutine)
2. Authenticates via public/private key
3. Connects to backend server
4. Proxies STDIN, STDOUT, and STDERR (3 goroutines)
How'd It Go?
KilnProxy: From Python to Go
Clone times halved!
Resiliency:
The Process
Careful Coding:
Error Handling & Clean-up
Handle All Errors
resourceA, err := OpenResourceA()
if err != nil {
return nil, err
}
defer resourceA.Close()
What About nil?
Back To Our Example:
resourceA, err := OpenResourceA()
if err != nil {
return nil, err
}
defer resourceA.Close()
Be Careful!
// can return nil, and that's not an error!
resourceA, err := OpenResourceA()
if err != nil {
return nil, err
}
defer resourceA.Close() // panic on nil????
Not Necessarily.
One Solution:
// can return nil, and that's not an error!
resourceA, err := OpenResourceA()
if err != nil {
return nil, err
}
defer func(){
if resourceA != nil {
resourceA.Close()
}
}()
Gross.
Make deferred Methods nil-Safe
// Don't forget: resource might be nil!
func (resource *Resource) Close() {
if resource != nil {
// .. clean up
}
}
Much Better!
// can return nil, and that's not an error!
resourceA, err := OpenResourceA()
if err != nil {
return nil, err
}
defer resourceA.Close() // will never panic!
Careful Coding:
Channels
Channel Axioms 1
1. A send to a nil channel blocks forever
2. A receive from a nil channel blocks forever
3. A send to a closed channel panics
4. A receive from a closed channel returns the zero value
immediately
1
http://dave.cheney.net/2014/03/19/channel-axioms
Careful Coding:
Panics!
You Can Recover From Panics
• ... but you shouldn't always do so!
• Only recover if you're sure it's okay
• Panic recovery is for current goroutine
• At very least, log the stack trace
Careful Coding:
Avoid Race Conditions!
Go's Race Detector:
• Reports when variable access is not synchronized
• Crashes with a full stack trace, including the read and write
goroutines
• Should be used in unit tests, development, and testing
environments
Race Detector Output:
==================
WARNING: DATA RACE
Read by goroutine 5:
main.func·001()
race.go:14 +0x169
Previous write by goroutine 1:
main.main()
race.go:15 +0x174
Goroutine 5 (running) created at:
time.goFunc()
src/pkg/time/sleep.go:122 +0x56
timerproc()
src/pkg/runtime/ztime_linux_amd64.c:181 +0x189
==================
Enable Race Detection:
$ go test -race mypkg // to test the package
$ go run -race mysrc.go // to run the source file
$ go build -race mycmd // to build the command
$ go install -race mypkg // to install the package
Careful Coding:
Implement Timeouts
Network Timeouts:
• network dial timeout
• network connection inactivity timeout
• total connection timeout
TEST ALL THE THINGS!
Know
Your Service
Know Your Service:
How Does It Use Memory?
Profile It!
http://github.com/fogcreek/profiler
What to Watch
• How much memory does the service use when idle?
• How much memory per connection?
• Does the system reclaim memory that's no longer used?
• What's the garbage collector doing? GODEBUG=gctrace=1
• Where is memory allocated? (PPROF)
Know Your Service:
PPROF
PPROF Inspects Your Running Process
• Blocking profile
• Goroutine count and full stacktraces
• Heap profile
• Stacktraces that lead to thread creations
Enabling PPROF:
import (
_ "net/http/pprof"
"net/http"
)
func main() {
http.ListenAndServe(":6060", nil)
// ...
}
PPROF Main Page
Don't leak
goroutines!
Use PPROF To Tell You:
• How many goroutines when nobody is connected?
• How many goroutines per connection?
• Are all goroutines cleaned up after all connections close?
PPROF: Goroutine Page
PPROF:
From the Command Line
What Are Your GoRoutines Doing?
$ go tool pprof ./server http://localhost:6060/debug/pprof/goroutine
(pprof) top5
11 of 11 total ( 100%)
Showing top 5 nodes out of 49 (cum >= 1)
flat flat% sum% cum cum%
9 81.82% 81.82% 9 81.82% runtime.gopark
1 9.09% 90.91% 1 9.09% runtime.notetsleepg
1 9.09% 100% 1 9.09% runtime/pprof.writeRuntimeProfile
0 0% 100% 1 9.09% bufio.(*Reader).ReadLine
0 0% 100% 1 9.09% bufio.(*Reader).ReadSlice
(pprof) web
2015-GopherCon-Talk-Uptime.pdf
Who's Allocating Heap Memory?
$ go tool pprof ./server http://localhost:6060/debug/pprof/heap
(pprof) top5
2362.41kB of 2362.41kB total ( 100%)
Dropped 28 nodes (cum <= 11.81kB)
flat flat% sum% cum cum%
1850.27kB 78.32% 78.32% 1850.27kB 78.32% github.com/wblakecaldwell/profiler.func·002
512.14kB 21.68% 100% 512.14kB 21.68% mcommoninit
0 0% 100% 1850.27kB 78.32% runtime.goexit
0 0% 100% 512.14kB 21.68% runtime.rt0_go
0 0% 100% 512.14kB 21.68% runtime.schedinit
(pprof) web
2015-GopherCon-Talk-Uptime.pdf
Know Your Service:
Watch It Run
/info Endpoint
{
Version: "1.0.275-b244a2b9b8-20150202.163449",
StartTimeEpochSecs: 1430515329,
CurrentTimeEpocSecs: 143117131,
Uptime: "167h10m2s"
}
Managing Service Version
Version: "1.0.275-b244a2b9b8-20150202.163449"
Which is:
<major>.<minor>.<commit#>-<Git SHA>-<date>.<time>
Managing Service Version
Version is stored in a global variable, set by your build script
In code:
var ServiceVersion string
Build script:
$ go build -ldflags 
"-X main.ServiceVersion 
1.0.275-b244a2b9b8-20150202.163449" 
kilnproxy
Keep Good Logs!
• Create a semi-unique string per request
• Use this request string as a prefix in all log entries
• Always log at least the start and end of a request
Who's Currently
Connected?
/connections Endpoint
{
"CurrentUserCount":1,
"CurrentlyAuthenticatedUsers":
[
{"Account":"aviato",
"Name":"Erlich Bachman",
"PublicKeyName":"Build Server",
"SessionKey":"106abc0c",
"SessionDuration":"25m4s"
}
]
}
Drain
and Die
Game
Day.
KilnProxy Using 40MB
more memory than normal!!!
Profiler Tells Me:
This memory is still in use
/connections Tells Me:
Initech is connected 10 times
{
"CurrentUserCount":25,
"CurrentlyAuthenticatedUsers":
[
{"Account":"initech", "Name":"Peter Gibbons", ...
{"Account":"initech", "Name":"Peter Gibbons", ...
{"Account":"initech", "Name":"Peter Gibbons", ...
{"Account":"initech", "Name":"Peter Gibbons", ...
{"Account":"initech", "Name":"Peter Gibbons", ...
...
]
}
Dev Profiling Told Me:
Each connection uses 4MB of memory
Dev PPROF Told Me:
Most of that 4MB is SSH internals
Wolfram Alpha Tells Me:
4MB x 10 = 40MB
We Contact Initech.
Timeouts Make Sure That:
Their connections will be closed
Prod Profiling Told Me:
This memory will be reclaimed
So....
Uptime:
Preserved.
2015-GopherCon-Talk-Uptime.pdf
Thanks!
Blake Caldwell
http://blakecaldwell.net
https://twitter.com/blakecaldwell
Memory Profiler: http://github.com/fogcreek/profiler
Credit: Images
• Thinking Man: ‫خ‬
‫ا‬
‫ل‬
‫د‬
‫م‬
‫ن‬
‫ت‬
‫ص‬
‫ف‬ - License: CC BY-SA 4.0 (http://
creativecommons.org/licenses/by-sa/4.0)
• Race car: Newlin Keen, Kollins Communications, Inc

More Related Content

PDF
Building resilient services in go
PDF
Go 1.10 Release Party - PDX Go
PDF
Go Profiling - John Graham-Cumming
PDF
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
PDF
Advanced debugging  techniques in different environments
ODP
Ceph Day Melbourne - Troubleshooting Ceph
PDF
marko_go_in_badoo
PPTX
Go. Why it goes
Building resilient services in go
Go 1.10 Release Party - PDX Go
Go Profiling - John Graham-Cumming
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
Advanced debugging  techniques in different environments
Ceph Day Melbourne - Troubleshooting Ceph
marko_go_in_badoo
Go. Why it goes

Similar to 2015-GopherCon-Talk-Uptime.pdf (20)

PPTX
Developing with the Go client for Apache Kafka
PDF
Silicon Valley JUG: JVM Mechanics
ODP
The why and how of moving to PHP 5.5/5.6
PDF
How to reverse engineer Android applications
PDF
How to reverse engineer Android applications—using a popular word game as an ...
PDF
Go. why it goes v2
PPTX
SDOC-9384E354D357952A32C22C63F420335F-12-16-SI.pptx
PDF
Debugging of (C)Python applications
PDF
How to Leverage Go for Your Networking Needs
PPTX
Improving go-git performance
ODP
The why and how of moving to PHP 5.4/5.5
ODP
The why and how of moving to php 5.4/5.5
ODP
Владимир Перепелица "Модули"
PDF
Presto anatomy
PDF
Symfony Performance
PDF
Debugging Python with gdb
PPTX
Atlassian meets Kerberos
PDF
gRPC with Scala and Swift
PPTX
Hadoop Pig
KEY
Rails Hardware (no conclusions!)
Developing with the Go client for Apache Kafka
Silicon Valley JUG: JVM Mechanics
The why and how of moving to PHP 5.5/5.6
How to reverse engineer Android applications
How to reverse engineer Android applications—using a popular word game as an ...
Go. why it goes v2
SDOC-9384E354D357952A32C22C63F420335F-12-16-SI.pptx
Debugging of (C)Python applications
How to Leverage Go for Your Networking Needs
Improving go-git performance
The why and how of moving to PHP 5.4/5.5
The why and how of moving to php 5.4/5.5
Владимир Перепелица "Модули"
Presto anatomy
Symfony Performance
Debugging Python with gdb
Atlassian meets Kerberos
gRPC with Scala and Swift
Hadoop Pig
Rails Hardware (no conclusions!)
Ad

Recently uploaded (20)

PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Pre independence Education in Inndia.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Structure & Organelles in detailed.
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
VCE English Exam - Section C Student Revision Booklet
Pre independence Education in Inndia.pdf
01-Introduction-to-Information-Management.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Structure & Organelles in detailed.
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Cell Types and Its function , kingdom of life
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
TR - Agricultural Crops Production NC III.pdf
Microbial disease of the cardiovascular and lymphatic systems
Renaissance Architecture: A Journey from Faith to Humanism
human mycosis Human fungal infections are called human mycosis..pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Ad

2015-GopherCon-Talk-Uptime.pdf