Go Internals
Eyal Post - Go Israel Meetup - May 2017
...you shouldn’t really care about
About Me
Eyal Post
Architect @Gett
(… Yes, I get quite a few jokes about HTTP POST vs GET... )
I like to peek under the hood
Embedding
https://www.slideshare.net/EyalPost/go-and-object-oriented-programming
Embedding under the hood
type Base struct {
someVar int
}
func (b Base) Print() {
fmt.Println("Called from Base: ", b.someVar)
}
type Sub struct {
AnotherVar int
Base
}
Embedding under the hood
func Test() {
b := Base{}
b.Print()
s := Sub{}
s.Print()
}
Embedding under the hood
func Test() {
b := CreateBase()
b.Print()
s := CreateSub()
s.Print()
}
func CreateBase() Base {
return Base{}
}
func CreateSub() Sub {
return Sub{}
}
TEXT github.com/eyalpost/GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go
embed.go:25 0x1087200 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX
embed.go:25 0x1087209 483b6110 CMPQ 0x10(CX), SP
embed.go:25 0x108720d 7650 JBE 0x108725f
embed.go:25 0x108720f 4883ec30 SUBQ $0x30, SP
embed.go:25 0x1087213 48896c2428 MOVQ BP, 0x28(SP)
embed.go:25 0x1087218 488d6c2428 LEAQ 0x28(SP), BP
embed.go:26 0x108721d e88effffff CALL GoMeetup201705/ embed.CreateBase(SB)
embed.go:26 0x1087222 488b0424 MOVQ 0(SP), AX
embed.go:26 0x1087226 4889442410 MOVQ AX, 0x10(SP)
embed.go:27 0x108722b 48890424 MOVQ AX, 0(SP)
embed.go:27 0x108722f e81cfeffff CALL GoMeetup201705/ embed.Base.Print(SB)
embed.go:29 0x1087234 e897ffffff CALL GoMeetup201705/embed.CreateSub(SB)
embed.go:29 0x1087239 488b0424 MOVQ 0(SP), AX
embed.go:29 0x108723d 488b4c2408 MOVQ 0x8(SP), CX
embed.go:29 0x1087242 4889442418 MOVQ AX, 0x18(SP)
embed.go:29 0x1087247 48894c2420 MOVQ CX, 0x20(SP)
embed.go:30 0x108724c 48890c24 MOVQ CX, 0(SP)
embed.go:30 0x1087250 e8fbfdffff CALL GoMeetup201705/embed.Base.Print(SB)
embed.go:31 0x1087255 488b6c2428 MOVQ 0x28(SP), BP
embed.go:31 0x108725a 4883c430 ADDQ $0x30, SP
embed.go:31 0x108725e c3 RET
embed.go:25 0x108725f e8bc2cfcff CALL runtime.morestack_noctxt(SB)
embed.go:25 0x1087264 eb9a JMP GoMeetup201705/embed.Test(SB)
TEXT github.com/eyalpost/GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go
embed.go:25 0x1087200 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX
embed.go:25 0x1087209 483b6110 CMPQ 0x10(CX), SP
embed.go:25 0x108720d 7650 JBE 0x108725f
embed.go:25 0x108720f 4883ec30 SUBQ $0x30, SP
embed.go:25 0x1087213 48896c2428 MOVQ BP, 0x28(SP)
embed.go:25 0x1087218 488d6c2428 LEAQ 0x28(SP), BP
embed.go:26 0x108721d e88effffff CALL GoMeetup201705/embed.CreateBase(SB)
embed.go:26 0x1087222 488b0424 MOVQ 0(SP), AX
embed.go:26 0x1087226 4889442410 MOVQ AX, 0x10(SP)
embed.go:27 0x108722b 48890424 MOVQ AX, 0(SP)
embed.go:27 0x108722f e81cfeffff CALL GoMeetup201705/embed.Base.Print(SB)
embed.go:29 0x1087234 e897ffffff CALL GoMeetup201705/ embed.CreateSub(SB)
embed.go:29 0x1087239 488b0424 MOVQ 0(SP), AX
embed.go:29 0x108723d 488b4c2408 MOVQ 0x8(SP), CX
embed.go:29 0x1087242 4889442418 MOVQ AX, 0x18(SP)
embed.go:29 0x1087247 48894c2420 MOVQ CX, 0x20(SP)
embed.go:30 0x108724c 48890c24 MOVQ CX, 0(SP)
embed.go:30 0x1087250 e8fbfdffff CALL GoMeetup201705/ embed.Base.Print(SB)
embed.go:31 0x1087255 488b6c2428 MOVQ 0x28(SP), BP
embed.go:31 0x108725a 4883c430 ADDQ $0x30, SP
embed.go:31 0x108725e c3 RET
embed.go:25 0x108725f e8bc2cfcff CALL runtime.morestack_noctxt(SB)
embed.go:25 0x1087264 eb9a JMP GoMeetup201705/embed.Test(SB)
Base.Print ??
> go tool objdump -s embed GoMeetup201705 | grep TEXT
TEXT GoMeetup201705/embed.Base.Print(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.CreateBase(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.CreateSub(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.init(SB) GoMeetup201705/embed/embed.go
Embedding under the hood
func Test() {
b := CreateBase()
b.Print()
s := CreateSub()
s.Print()
fmt.Println(reflect.TypeOf(s).Method(0).Name)
}
What will this show?
> go tool objdump -s embed GoMeetup201705 | grep TEXT
TEXT GoMeetup201705/embed.Base.Print(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.CreateBase(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.CreateSub(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.init(SB) GoMeetup201705/embed/embed.go
TEXT GoMeetup201705/embed.(*Base).Print(SB) <autogenerated>
TEXT GoMeetup201705/embed.(*Sub).Print(SB) <autogenerated>
TEXT GoMeetup201705/embed.Sub.Print(SB) <autogenerated>
Once reflection was used, the Go compiler generates the promoted methods
TEXT GoMeetup201705/embed.Sub.Print(SB) <autogenerated>
<autogenerated>:3 0x109e320 65488b0c25a0080 GS MOVQ GS:0x8a0, CX
<autogenerated>:3 0x109e329 483b6110 CMPQ 0x10(CX), SP
<autogenerated>:3 0x109e32d 763c JBE 0x109e36b
<autogenerated>:3 0x109e32f 4883ec10 SUBQ $0x10, SP
<autogenerated>:3 0x109e333 48896c2408 MOVQ BP, 0x8(SP)
<autogenerated>:3 0x109e338 488d6c2408 LEAQ 0x8(SP), BP
<autogenerated>:3 0x109e33d 488b5920 MOVQ 0x20(CX), BX
<autogenerated>:3 0x109e341 4885db TESTQ BX, BX
<autogenerated>:3 0x109e344 740d JE 0x109e353
<autogenerated>:3 0x109e346 488d7c2418 LEAQ 0x18(SP), DI
<autogenerated>:3 0x109e34b 48393b CMPQ DI, 0(BX)
<autogenerated>:3 0x109e34e 7503 JNE 0x109e353
<autogenerated>:3 0x109e350 488923 MOVQ SP, 0(BX)
<autogenerated>:3 0x109e353 488b442420 MOVQ 0x20(SP), AX
<autogenerated>:3 0x109e358 48890424 MOVQ AX, 0(SP)
<autogenerated>:3 0x109e35c e8cff8ffff CALL GoMeetup201705/embed. Base.Print(SB)
<autogenerated>:3 0x109e361 488b6c2408 MOVQ 0x8(SP), BP
<autogenerated>:3 0x109e366 4883c410 ADDQ $0x10, SP
<autogenerated>:3 0x109e36a c3 RET
<autogenerated>:3 0x109e36b e890c3faff CALL runtime.morestack_noctxt(SB)
<autogenerated>:3 0x109e370 ebae JMP GoMeetup201705/embed.Sub.Print(SB)
But actually..
When I first tried to disassemble these functions
This is what I got
embed.go:26 0x1087140 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX
embed.go:26 0x1087149 483b6110 CMPQ 0x10(CX), SP
embed.go:26 0x108714d 7638 JBE 0x1087187
embed.go:26 0x108714f 4883ec10 SUBQ $0x10, SP
embed.go:26 0x1087153 48896c2408 MOVQ BP, 0x8(SP)
embed.go:26 0x1087158 488d6c2408 LEAQ 0x8(SP), BP
embed.go:27 0x108715d 488b0554470300 MOVQ 0x34754(IP), AX
embed.go:28 0x1087164 48890424 MOVQ AX, 0(SP)
embed.go:28 0x1087168 e8e3feffff CALL GoMeetup201705/embed.Base.Print(SB)
embed.go:30 0x108716d 488b05244a0300 MOVQ 0x34a24(IP), AX
embed.go:31 0x1087174 48890424 MOVQ AX, 0(SP)
embed.go:31 0x1087178 e8d3feffff CALL GoMeetup201705/embed.Base.Print(SB)
embed.go:32 0x108717d 488b6c2408 MOVQ 0x8(SP), BP
embed.go:32 0x1087182 4883c410 ADDQ $0x10, SP
embed.go:32 0x1087186 c3 RET
embed.go:26 0x1087187 e8942dfcff CALL runtime.morestack_noctxt(SB)
embed.go:26 0x108718c ebb2 JMP GoMeetup201705/embed.Test(SB)
No calls to CreateBaseCreateSub..
Inlining
The code was inlined by the compiler
use go build -gcflags "-N -l" to disable inlining and optimizations
The code was inlined by the compiler
But.. wasn’t inlining removed?
And planned to be brought back at Go 1.9?
Inlining
Mid-stack inlining was removed
https://golang.org/s/go19inliningtalk
Inlining
Mid-Stack inlining
func main() {
Func1()
}
func Func1() {
i := Func2()
fmt.Println("In Func1", i)
}
func Func2() int {
i := Func3(50)
fmt.Println("In Func2", i)
return i + 1
}
func Func3(j int) int {
i := j
if j > 10 {
i = i * j
}
return i
}
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Mid stack
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Mid stack
Leaf
Leafs are still inlined
Why were mid-stack removed from inlining?
Inlining
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Let’s say this function panics
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
And we’re printing a stack trace here
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
We might expect to get a stack trace like this:
panic(0x497480, 0xc0420401d0)
/go/src/runtime/panic.go:489 +0x2dd
main.Func2(0x0)
/gocode/src/play/main.go:27 +0x4f
main.Func1()
/gocode/src/play/main.go:22 +0x3b
main.main()
/gocode/src/play/main.go:17 +0x47
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
But if these were inlined
panic(0x497480, 0xc0420401d0)
/go/src/runtime/panic.go:489 +0x2dd
main.Func2(0x0)
/gocode/src/play/main.go:27 +0x4f
main.Func1()
/gocode/src/play/main.go:22 +0x3b
main.main()
/gocode/src/play/main.go:17 +0x47
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
We’ll probably get something like this:
panic(0x497480, 0xc0420401d0)
/go/src/runtime/panic.go:489 +0x2dd
main.Func2(0x0)
/gocode/src/play/main.go:27 +0x4f
main.Func1()
/gocode/src/play/main.go:22 +0x3b
main.main()
/gocode/src/play/main.go:17 +0x47
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Let’s say this function panics now
But if leaf are still inlined... wouldn’t we get
the same problem?
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Let’s check...
panic(0x497480, 0xc0420401d0)
/go/src/runtime/panic.go:489 +0x2dd
main.Func3(0x32, 0x0)
/gocode/src/play/main.go:36 +0x8f
main.Func2(0x0)
/gocode/src/play/main.go:27 +0x4f
main.Func1()
/gocode/src/play/main.go:22 +0x3b
main.main()
/gocode/src/play/main.go:17 +0x47
Stack trace looks good. How come?
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Ok, that was a trick question :-)
panic(0x497480, 0xc0420401d0)
/go/src/runtime/panic.go:489 +0x2dd
main.Func3(0x32, 0x0)
/gocode/src/play/main.go:36 +0x8f
main.Func2(0x0)
/gocode/src/play/main.go:27 +0x4f
main.Func1()
/gocode/src/play/main.go:22 +0x3b
main.main()
/gocode/src/play/main.go:17 +0x47
panic()
Mid-Stack inlining
main()
fmt.Println()
Func1()
fmt.Println()
Func2()
Func3()
Let’s generate panic without calling panic
panic(0x49ca80, 0x5086d0)
/go/src/runtime/panic.go:489 +0x2dd
main.Func2(0xc042035ed0)
/gocode/src/play/main.go:27 +0x11
main.Func1()
/gocode/src/play/main.go:22 +0x2d
main.main()
/gocode/src/play/main.go:17 +0x45
And indeed we’re missing Func3 from the
stack trace
So stack traces are missing on leaf stacks
Why does Go still inline them?
Inlining
Happens less frequently?
Considered less confusing?
Speaking of stack traces..
Stacks
...I noticed these in many functions:
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
Place pointer in register
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
Compare to SP
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
If below or equal
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
jump to
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) Call ‘morestack’
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) And rerun the code
More stack?
Stacks
Goroutines are “lightweight threads”
What makes them lightweight?
The fact that they’re “cheap” to create
Stack sizes
A thread needs a stack
This is where it places:
local variables, parameters and return values
Stack sizes
If stack size is too small
It will limit the “depth” of your function calls
Stack sizes
If stack size is too high
The overhead of each thread is higher
Stack sizes
Go takes a different approach
Stack sizes
Goroutine stacks start very small
(4K vs 1M)
Stack sizes
But grow as needed
(And also shrinks when it’s no longer needed)
Stack sizes
This means we need to check the stack size on each function call
Stack sizes
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
Some functions do not contain the “morestack” logic
(E.g. Small “Leaf” functions
since they can not require space beyond the existing stack)
Stack sizes
So what else makes Goroutines light?
Goroutines
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
Main
goroutine
Let’s look at some code…
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
Create goroutines
Main
goroutine
goroutine
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
Main waits for
goroutines to start
Main
goroutine
goroutine
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
Main signals goroutines to stop
Main
goroutine
goroutine
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
And waits for them
Main
goroutine
goroutine
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
How much progress will the
Goroutine make before it stopped?
(Atomic eqivalent of progress++)
Main
goroutine
goroutine
Go routine started: 0
All done. Progress: 2195
Total Time 82.036µs
Goroutines
1 goroutine
Go routine started: 1
Go routine started: 0
All done. Progress: 18353
Total Time 1.375069ms
Goroutines
2 goroutines
Go routine started: 2
Go routine started: 0
Go routine started: 1
All done. Progress: 558205
Total Time 10.147941ms
Goroutines
3 goroutines
?
Goroutines
4 goroutines
Go routine started: 0
Go routine started: 1
Go routine started: 3
Go routine started: 2
Goroutines
4 goroutines
Go routine started: 0
Go routine started: 1
Go routine started: 3
Go routine started: 2
Goroutines
4 goroutines
Runs infinitely using 100% cpu*
*On a 4 cpu machine
Go routine started: 0
Go routine started: 1
Go routine started: 3
Go routine started: 2
Goroutines
4 goroutines
Runs infinitely using 100% cpu*
Why?
Goroutines are scheduled by the Go runtime
(Threads by the OS scheduler)
Scheduler
Goroutines are scheduled by the Go runtime
(Threads by the OS scheduler)
Goroutines are scheduled “cooperatively”
(Threads are scheduled “preemptively”)
Scheduler
Goroutines need to check periodically whether their time is “up”
When does it happen?
Cooperative Scheduling
I/O
Channel reads
Sleep
CMPQ 0x10(CX), SP
Cooperative Scheduling
I/O
Channel reads
Sleep
CMPQ 0x10(CX), SP
Cooperative Scheduling
Remember this one?
Morestack is also used to switch goroutines
Stacks
TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go
tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX
tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP
tester.go:7 0x108717d 762f JBE 0x10871ae
tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP
tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP)
tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP
tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB)
tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX
tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP)
tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP)
tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB)
tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP
tester.go:100x10871a9 4883c418 ADDQ $0x18, SP
tester.go:100x10871ad c3 RET
tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB)
tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
Let’s see how it works
Cooperative Scheduling
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
doSomething()
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Goroutines
func doSomething() {
doNothing()
}
func doNothing() {
}
Go routine started: 3
Go routine started: 0
Go routine started: 1
Go routine started: 2
All done. Progress: 2053541
Total Time 29.64113ms
Goroutines
4 goroutines
Probably not something you should think about, but still…
Also will be interesting to see how this is affected by mid-stack inlining
func startRace() {
numOfGoRoutines := 1
progress := int64(0)
done := int64(0)
goRoutineStarted := sync.WaitGroup{}
goRoutineStarted.Add(numOfGoRoutines)
goRoutineFinished := sync.WaitGroup{}
goRoutineFinished.Add(numOfGoRoutines)
for i := 0; i < numOfGoRoutines; i++ {
go func(id int) {
fmt.Println("Go routine started:", id)
goRoutineStarted.Done()
for atomic.LoadInt64(&done) == 0 {
atomic.AddInt64(&progress, 1)
}
goRoutineFinished.Done()
}(i)
}
goRoutineStarted.Wait()
atomic.StoreInt64(&done, 1)
goRoutineFinished.Wait()
fmt.Println("All done. Progress: ", progress)
}
Atomic vs Mutex
A friend did a benchmark of
atomic vs mutex
Speaking of atomic...
Atomic vs Mutex
go func() {
for j := 0; j < n; j++ {
m.Lock()
a += 1
m.Unlock()
}
}()
VS
go func() {
for j := 0; j < n; j++ {
atomic. AddInt64(&a, 1)
}
}()
Atomic vs Mutex
*You can find the complete benchmark here:
https://github.com/BorisBorshevsky/GolangDemos/tree/master/demos/sync-pack/atomic
Atomic is indeed faster..
Atomic vs Mutex
*You can find the complete benchmark here:
https://github.com/BorisBorshevsky/GolangDemos/tree/master/demos/sync-pack/atomic
But this also caught our attention
Mutexes
Let’s look at a simplified version of the test
func mutexRun(n int) {
m := &sync.Mutex{}
wg := sync.WaitGroup{}
wg.Add(n)
for i := 0; i < n; i++ {
go func() {
for j := 0; j < 1000; j++ {
m. Lock()
m. Unlock()
}
wg. Done()
}()
}
wg.Wait()
}
Create n goroutines
Mutexes
Let’s look at a simplified version of the test
func mutexRun(n int) {
m := &sync.Mutex{}
wg := sync.WaitGroup{}
wg.Add(n)
for i := 0; i < n; i++ {
go func() {
for j := 0; j < 1000; j++ {
m. Lock()
m. Unlock()
}
wg. Done()
}()
}
wg.Wait()
}
Loop 1,000 times
Mutexes
Let’s look at a simplified version of the test
func mutexRun(n int) {
m := &sync.Mutex{}
wg := sync.WaitGroup{}
wg.Add(n)
for i := 0; i < n; i++ {
go func() {
for j := 0; j < 1000; j++ {
m.Lock()
m.Unlock()
}
wg. Done()
}()
}
wg.Wait()
}
Just Lock & Unlock
Mutexes
Create benchmark tests
func benchmarkMutex (c int, b *testing.B) {
for n := 0; n < b.N; n++ {
mutexRun(c)
}
}
func BenchmarkMutex10 (b *testing.B) {
benchmarkMutex (10, b)
}
func BenchmarkMutex100 (b *testing.B) {
benchmarkMutex (100, b)
}
func BenchmarkMutex1000 (b *testing.B) {
benchmarkMutex (1000, b)
}
func BenchmarkMutex10000 (b *testing.B) {
benchmarkMutex (10000, b)
}
10 goroutines
VS
100 goroutines
VS
1,000 goroutines
VS
10,000 goroutines
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
10 goroutines
# of loops
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
10 goroutines
Avg time per op
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
10 goroutines
Allocated bytes per op
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
10 goroutines
# of allocations per op
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
BenchmarkMutex100-8 100 11280653 ns/op 787 B/op 5 allocs/op
100 goroutines
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
BenchmarkMutex100-8 100 11280653 ns/op 787 B/op 5 allocs/op
BenchmarkMutex1000-8 10 110106350 ns/op 4372 B/op 41 allocs/op
1,000 goroutines
Mutexes
go test -bench=. -benchmem
BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op
BenchmarkMutex100-8 100 11280653 ns/op 787 B/op 5 allocs/op
BenchmarkMutex1000-8 10 110106350 ns/op 4372 B/op 41 allocs/op
BenchmarkMutex10000-8 1 1106063700 ns/op 4981360 B/op 16107 allocs/op
PASS
ok play/mutex 5.836s
10,000 goroutines
Mutexes
BenchmarkMutex10000-8 1 1106063700 ns/op 4981360 B/op 16107 allocs/op
Why?
To understand why,
Let’s first see how a mutex works
Mutexes
Ever implemented a Redis lock?
Mutexes
The basic* idea goes like this:
Have a key in redis which represents the lock (e.g. MyLock1)
Call “SETNX MyLock1 <SomeValue>” (Set If Not Exists)
If SETNX returns 1 - lock acquired
Otherwise sleep and try again until lock acquired
Mutexes
The basic* idea goes like this:
Have a key in redis which represents the lock (e.g. MyLock1)
Call “SETNX MyLock1 <SomeValue>” (Set If Not Exists)
If SETNX returns 1 - lock acquired
Otherwise sleep and try again until lock acquired
*This is a simplified and incomplete implementation. See https://redis.io/commands/setnx for details on how to implement a Redis lock correctly
Mutexes
The closest to SETNX in Go is CAS:
var MyLock1 int64
success := atomic.CompareAndSwapInt64(&MyLock1, 0, 1)
I.e.
But atomically
if MyLock == 0 {
MyLock = 1
}
Mutexes
type MyMutex struct {
state int32
}
func (m *MyMutex) Lock() {
for !atomic.CompareAndSwapInt32(&m.state, 0, 1) {
...
}
}
func (m *MyMutex) Unlock() {
atomic.StoreInt32(&m.state, 0)
}
Simple mutex implementation:
To Lock:
Repeatedly try to set 1
Mutexes
type MyMutex struct {
state int32
}
func (m *MyMutex) Lock() {
for !atomic.CompareAndSwapInt32 (&m.state, 0, 1) {
...
}
}
func (m *MyMutex) Unlock() {
atomic.StoreInt32(&m.state, 0)
}
Simple mutex implementation:
To Unlock:
Set to 0
type MyMutex struct {
state int32
}
func (m *MyMutex) Lock() {
for !atomic.CompareAndSwapInt32 (&m.state, 0, 1) {
...
}
}
func (m *MyMutex) Unlock() {
atomic.StoreInt32(&m.state, 0)
}
Mutexes
Simple mutex implementation:
What should we do here?
Sleep? Nothing?
Mutexes
If we sleep, we lose precious time
If we don’t sleep (spin), we consume CPU
Mutexes
So how are mutexes implemented?
Mutexes
First, try fast lock
func (m *MyMutex) Lock() {
if atomic.CompareAndSwapInt32(&m.state, 0, 1) {
return
}
…
}
Mutexes
Next, try spinning
func (m *MyMutex) Lock() {
…
for {
if canSpin(iter) {
if atomic.CompareAndSwapInt32 (&m.state, 0, 1) {
break;
}
iter++
doSpin()
}
}
…
}
If we’re using multiple cores and
there’s at least one other running
goroutine and we didn’t spin too
long (iter<X)
Mutexes
Add this goroutine to the “waitlist” for this mutex
(This is where allocations come from)
and go to sleep (park) until woken up by an Unlock
Last
func (m *MyMutex) Lock() {
…
runtime_SemacquireMutex(…)
}
Mutexes
Mutexes are very fast when:
- There’s no contention
- Time spent within the lock is very short
Mutex performance degrades when there is a lot of contention
Mutexes
You can profile contention with
go test -mutexprofile=mutex.out
Summary
Summary
- Embedding
- Mid stack inlining
- Dynamic stack size
- Cooperative scheduling
- Mutexes
Thank you!
Eyal Post

More Related Content

PDF
Why Go Scales
PDF
Go and Object Oriented Programming
PDF
SWP - A Generic Language Parser
PPTX
DOCX
Java Program
DOCX
Travel management
PDF
C programs Set 4
PDF
MySQL 8.0 NF : Common Table Expressions (CTE)
Why Go Scales
Go and Object Oriented Programming
SWP - A Generic Language Parser
Java Program
Travel management
C programs Set 4
MySQL 8.0 NF : Common Table Expressions (CTE)

What's hot (20)

PPSX
โปรแกรมภาษาซีเบื้องต้น
PPT
Unit 5 Foc
DOCX
C program to implement linked list using array abstract data type
PPTX
CQL 实现
PDF
Programming with GUTs
DOC
Final ds record
PDF
Data Structure
PDF
Going Loopy - Adventures in Iteration with Google Go
PDF
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
DOCX
System programmin practical file
PDF
An Introduction to Tinkerpop
PDF
Railway reservation system
TXT
System programs in C language.
PDF
CS50 Lecture3
TXT
Hsn code not show
PDF
R Programming: Transform/Reshape Data In R
PDF
Going Loopy: Adventures in Iteration with Go
PDF
C++ Programming - 14th Study
PDF
Programming with GUTs
โปรแกรมภาษาซีเบื้องต้น
Unit 5 Foc
C program to implement linked list using array abstract data type
CQL 实现
Programming with GUTs
Final ds record
Data Structure
Going Loopy - Adventures in Iteration with Google Go
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
System programmin practical file
An Introduction to Tinkerpop
Railway reservation system
System programs in C language.
CS50 Lecture3
Hsn code not show
R Programming: Transform/Reshape Data In R
Going Loopy: Adventures in Iteration with Go
C++ Programming - 14th Study
Programming with GUTs
Ad

Similar to Go internals (Go Israel Meetup) (20)

PDF
Disassembling Go
PDF
Go 1.8 Release Party
PPTX
golang_getting_started.pptx
PDF
No instrumentation Golang Logging with eBPF (GoSF talk 11/11/20)
PDF
No instrumentation Golang Logging with eBPF (GoSF talk 11/11/20)
PDF
Reflection in Go
PDF
Debugger Principle Overview & GDB Tricks
PDF
Golang execution modes
PDF
Continuous Go Profiling & Observability
PDF
Finding Xori: Malware Analysis Triage with Automated Disassembly
PDF
Something about Golang
PDF
HKG15-207: Advanced Toolchain Usage Part 3
PDF
Develop Android app using Golang
PDF
HKG15-211: Advanced Toolchain Usage Part 4
PDF
How a Failed Experiment Helped Me Understand the Go Runtime in More Depth
PPTX
The GO Language : From Beginners to Gophers
PPT
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
PDF
Go for Mobile Games
PPTX
Android virtual machine internals
PPTX
Go Programming Language (Golang)
Disassembling Go
Go 1.8 Release Party
golang_getting_started.pptx
No instrumentation Golang Logging with eBPF (GoSF talk 11/11/20)
No instrumentation Golang Logging with eBPF (GoSF talk 11/11/20)
Reflection in Go
Debugger Principle Overview & GDB Tricks
Golang execution modes
Continuous Go Profiling & Observability
Finding Xori: Malware Analysis Triage with Automated Disassembly
Something about Golang
HKG15-207: Advanced Toolchain Usage Part 3
Develop Android app using Golang
HKG15-211: Advanced Toolchain Usage Part 4
How a Failed Experiment Helped Me Understand the Go Runtime in More Depth
The GO Language : From Beginners to Gophers
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
Go for Mobile Games
Android virtual machine internals
Go Programming Language (Golang)
Ad

Recently uploaded (20)

PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
STKI Israel Market Study 2025 version august
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Five Habits of High-Impact Board Members
PPT
What is a Computer? Input Devices /output devices
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Credit Without Borders: AI and Financial Inclusion in Bangladesh
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
OpenACC and Open Hackathons Monthly Highlights July 2025
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Developing a website for English-speaking practice to English as a foreign la...
Getting started with AI Agents and Multi-Agent Systems
Improvisation in detection of pomegranate leaf disease using transfer learni...
STKI Israel Market Study 2025 version august
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Module 1.ppt Iot fundamentals and Architecture
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Convolutional neural network based encoder-decoder for efficient real-time ob...
Five Habits of High-Impact Board Members
What is a Computer? Input Devices /output devices
Comparative analysis of machine learning models for fake news detection in so...
The influence of sentiment analysis in enhancing early warning system model f...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Microsoft Excel 365/2024 Beginner's training
sustainability-14-14877-v2.pddhzftheheeeee
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx

Go internals (Go Israel Meetup)

  • 1. Go Internals Eyal Post - Go Israel Meetup - May 2017 ...you shouldn’t really care about
  • 2. About Me Eyal Post Architect @Gett (… Yes, I get quite a few jokes about HTTP POST vs GET... )
  • 3. I like to peek under the hood
  • 5. Embedding under the hood type Base struct { someVar int } func (b Base) Print() { fmt.Println("Called from Base: ", b.someVar) } type Sub struct { AnotherVar int Base }
  • 6. Embedding under the hood func Test() { b := Base{} b.Print() s := Sub{} s.Print() }
  • 7. Embedding under the hood func Test() { b := CreateBase() b.Print() s := CreateSub() s.Print() } func CreateBase() Base { return Base{} } func CreateSub() Sub { return Sub{} }
  • 8. TEXT github.com/eyalpost/GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go embed.go:25 0x1087200 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX embed.go:25 0x1087209 483b6110 CMPQ 0x10(CX), SP embed.go:25 0x108720d 7650 JBE 0x108725f embed.go:25 0x108720f 4883ec30 SUBQ $0x30, SP embed.go:25 0x1087213 48896c2428 MOVQ BP, 0x28(SP) embed.go:25 0x1087218 488d6c2428 LEAQ 0x28(SP), BP embed.go:26 0x108721d e88effffff CALL GoMeetup201705/ embed.CreateBase(SB) embed.go:26 0x1087222 488b0424 MOVQ 0(SP), AX embed.go:26 0x1087226 4889442410 MOVQ AX, 0x10(SP) embed.go:27 0x108722b 48890424 MOVQ AX, 0(SP) embed.go:27 0x108722f e81cfeffff CALL GoMeetup201705/ embed.Base.Print(SB) embed.go:29 0x1087234 e897ffffff CALL GoMeetup201705/embed.CreateSub(SB) embed.go:29 0x1087239 488b0424 MOVQ 0(SP), AX embed.go:29 0x108723d 488b4c2408 MOVQ 0x8(SP), CX embed.go:29 0x1087242 4889442418 MOVQ AX, 0x18(SP) embed.go:29 0x1087247 48894c2420 MOVQ CX, 0x20(SP) embed.go:30 0x108724c 48890c24 MOVQ CX, 0(SP) embed.go:30 0x1087250 e8fbfdffff CALL GoMeetup201705/embed.Base.Print(SB) embed.go:31 0x1087255 488b6c2428 MOVQ 0x28(SP), BP embed.go:31 0x108725a 4883c430 ADDQ $0x30, SP embed.go:31 0x108725e c3 RET embed.go:25 0x108725f e8bc2cfcff CALL runtime.morestack_noctxt(SB) embed.go:25 0x1087264 eb9a JMP GoMeetup201705/embed.Test(SB)
  • 9. TEXT github.com/eyalpost/GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go embed.go:25 0x1087200 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX embed.go:25 0x1087209 483b6110 CMPQ 0x10(CX), SP embed.go:25 0x108720d 7650 JBE 0x108725f embed.go:25 0x108720f 4883ec30 SUBQ $0x30, SP embed.go:25 0x1087213 48896c2428 MOVQ BP, 0x28(SP) embed.go:25 0x1087218 488d6c2428 LEAQ 0x28(SP), BP embed.go:26 0x108721d e88effffff CALL GoMeetup201705/embed.CreateBase(SB) embed.go:26 0x1087222 488b0424 MOVQ 0(SP), AX embed.go:26 0x1087226 4889442410 MOVQ AX, 0x10(SP) embed.go:27 0x108722b 48890424 MOVQ AX, 0(SP) embed.go:27 0x108722f e81cfeffff CALL GoMeetup201705/embed.Base.Print(SB) embed.go:29 0x1087234 e897ffffff CALL GoMeetup201705/ embed.CreateSub(SB) embed.go:29 0x1087239 488b0424 MOVQ 0(SP), AX embed.go:29 0x108723d 488b4c2408 MOVQ 0x8(SP), CX embed.go:29 0x1087242 4889442418 MOVQ AX, 0x18(SP) embed.go:29 0x1087247 48894c2420 MOVQ CX, 0x20(SP) embed.go:30 0x108724c 48890c24 MOVQ CX, 0(SP) embed.go:30 0x1087250 e8fbfdffff CALL GoMeetup201705/ embed.Base.Print(SB) embed.go:31 0x1087255 488b6c2428 MOVQ 0x28(SP), BP embed.go:31 0x108725a 4883c430 ADDQ $0x30, SP embed.go:31 0x108725e c3 RET embed.go:25 0x108725f e8bc2cfcff CALL runtime.morestack_noctxt(SB) embed.go:25 0x1087264 eb9a JMP GoMeetup201705/embed.Test(SB) Base.Print ??
  • 10. > go tool objdump -s embed GoMeetup201705 | grep TEXT TEXT GoMeetup201705/embed.Base.Print(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.CreateBase(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.CreateSub(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.init(SB) GoMeetup201705/embed/embed.go
  • 11. Embedding under the hood func Test() { b := CreateBase() b.Print() s := CreateSub() s.Print() fmt.Println(reflect.TypeOf(s).Method(0).Name) } What will this show?
  • 12. > go tool objdump -s embed GoMeetup201705 | grep TEXT TEXT GoMeetup201705/embed.Base.Print(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.CreateBase(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.CreateSub(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.Test(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.init(SB) GoMeetup201705/embed/embed.go TEXT GoMeetup201705/embed.(*Base).Print(SB) <autogenerated> TEXT GoMeetup201705/embed.(*Sub).Print(SB) <autogenerated> TEXT GoMeetup201705/embed.Sub.Print(SB) <autogenerated> Once reflection was used, the Go compiler generates the promoted methods
  • 13. TEXT GoMeetup201705/embed.Sub.Print(SB) <autogenerated> <autogenerated>:3 0x109e320 65488b0c25a0080 GS MOVQ GS:0x8a0, CX <autogenerated>:3 0x109e329 483b6110 CMPQ 0x10(CX), SP <autogenerated>:3 0x109e32d 763c JBE 0x109e36b <autogenerated>:3 0x109e32f 4883ec10 SUBQ $0x10, SP <autogenerated>:3 0x109e333 48896c2408 MOVQ BP, 0x8(SP) <autogenerated>:3 0x109e338 488d6c2408 LEAQ 0x8(SP), BP <autogenerated>:3 0x109e33d 488b5920 MOVQ 0x20(CX), BX <autogenerated>:3 0x109e341 4885db TESTQ BX, BX <autogenerated>:3 0x109e344 740d JE 0x109e353 <autogenerated>:3 0x109e346 488d7c2418 LEAQ 0x18(SP), DI <autogenerated>:3 0x109e34b 48393b CMPQ DI, 0(BX) <autogenerated>:3 0x109e34e 7503 JNE 0x109e353 <autogenerated>:3 0x109e350 488923 MOVQ SP, 0(BX) <autogenerated>:3 0x109e353 488b442420 MOVQ 0x20(SP), AX <autogenerated>:3 0x109e358 48890424 MOVQ AX, 0(SP) <autogenerated>:3 0x109e35c e8cff8ffff CALL GoMeetup201705/embed. Base.Print(SB) <autogenerated>:3 0x109e361 488b6c2408 MOVQ 0x8(SP), BP <autogenerated>:3 0x109e366 4883c410 ADDQ $0x10, SP <autogenerated>:3 0x109e36a c3 RET <autogenerated>:3 0x109e36b e890c3faff CALL runtime.morestack_noctxt(SB) <autogenerated>:3 0x109e370 ebae JMP GoMeetup201705/embed.Sub.Print(SB)
  • 14. But actually.. When I first tried to disassemble these functions This is what I got
  • 15. embed.go:26 0x1087140 65488b0c25a0080000 GS MOVQ GS:0x8a0, CX embed.go:26 0x1087149 483b6110 CMPQ 0x10(CX), SP embed.go:26 0x108714d 7638 JBE 0x1087187 embed.go:26 0x108714f 4883ec10 SUBQ $0x10, SP embed.go:26 0x1087153 48896c2408 MOVQ BP, 0x8(SP) embed.go:26 0x1087158 488d6c2408 LEAQ 0x8(SP), BP embed.go:27 0x108715d 488b0554470300 MOVQ 0x34754(IP), AX embed.go:28 0x1087164 48890424 MOVQ AX, 0(SP) embed.go:28 0x1087168 e8e3feffff CALL GoMeetup201705/embed.Base.Print(SB) embed.go:30 0x108716d 488b05244a0300 MOVQ 0x34a24(IP), AX embed.go:31 0x1087174 48890424 MOVQ AX, 0(SP) embed.go:31 0x1087178 e8d3feffff CALL GoMeetup201705/embed.Base.Print(SB) embed.go:32 0x108717d 488b6c2408 MOVQ 0x8(SP), BP embed.go:32 0x1087182 4883c410 ADDQ $0x10, SP embed.go:32 0x1087186 c3 RET embed.go:26 0x1087187 e8942dfcff CALL runtime.morestack_noctxt(SB) embed.go:26 0x108718c ebb2 JMP GoMeetup201705/embed.Test(SB) No calls to CreateBaseCreateSub..
  • 16. Inlining The code was inlined by the compiler use go build -gcflags "-N -l" to disable inlining and optimizations
  • 17. The code was inlined by the compiler But.. wasn’t inlining removed? And planned to be brought back at Go 1.9? Inlining
  • 18. Mid-stack inlining was removed https://golang.org/s/go19inliningtalk Inlining
  • 19. Mid-Stack inlining func main() { Func1() } func Func1() { i := Func2() fmt.Println("In Func1", i) } func Func2() int { i := Func3(50) fmt.Println("In Func2", i) return i + 1 } func Func3(j int) int { i := j if j > 10 { i = i * j } return i }
  • 23. Leafs are still inlined Why were mid-stack removed from inlining? Inlining
  • 26. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() We might expect to get a stack trace like this: panic(0x497480, 0xc0420401d0) /go/src/runtime/panic.go:489 +0x2dd main.Func2(0x0) /gocode/src/play/main.go:27 +0x4f main.Func1() /gocode/src/play/main.go:22 +0x3b main.main() /gocode/src/play/main.go:17 +0x47
  • 27. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() But if these were inlined panic(0x497480, 0xc0420401d0) /go/src/runtime/panic.go:489 +0x2dd main.Func2(0x0) /gocode/src/play/main.go:27 +0x4f main.Func1() /gocode/src/play/main.go:22 +0x3b main.main() /gocode/src/play/main.go:17 +0x47
  • 28. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() We’ll probably get something like this: panic(0x497480, 0xc0420401d0) /go/src/runtime/panic.go:489 +0x2dd main.Func2(0x0) /gocode/src/play/main.go:27 +0x4f main.Func1() /gocode/src/play/main.go:22 +0x3b main.main() /gocode/src/play/main.go:17 +0x47
  • 29. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() Let’s say this function panics now But if leaf are still inlined... wouldn’t we get the same problem?
  • 30. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() Let’s check... panic(0x497480, 0xc0420401d0) /go/src/runtime/panic.go:489 +0x2dd main.Func3(0x32, 0x0) /gocode/src/play/main.go:36 +0x8f main.Func2(0x0) /gocode/src/play/main.go:27 +0x4f main.Func1() /gocode/src/play/main.go:22 +0x3b main.main() /gocode/src/play/main.go:17 +0x47 Stack trace looks good. How come?
  • 31. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() Ok, that was a trick question :-) panic(0x497480, 0xc0420401d0) /go/src/runtime/panic.go:489 +0x2dd main.Func3(0x32, 0x0) /gocode/src/play/main.go:36 +0x8f main.Func2(0x0) /gocode/src/play/main.go:27 +0x4f main.Func1() /gocode/src/play/main.go:22 +0x3b main.main() /gocode/src/play/main.go:17 +0x47 panic()
  • 32. Mid-Stack inlining main() fmt.Println() Func1() fmt.Println() Func2() Func3() Let’s generate panic without calling panic panic(0x49ca80, 0x5086d0) /go/src/runtime/panic.go:489 +0x2dd main.Func2(0xc042035ed0) /gocode/src/play/main.go:27 +0x11 main.Func1() /gocode/src/play/main.go:22 +0x2d main.main() /gocode/src/play/main.go:17 +0x45 And indeed we’re missing Func3 from the stack trace
  • 33. So stack traces are missing on leaf stacks Why does Go still inline them? Inlining Happens less frequently? Considered less confusing?
  • 34. Speaking of stack traces.. Stacks
  • 35. ...I noticed these in many functions: Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
  • 36. Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) Place pointer in register
  • 37. Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) Compare to SP
  • 38. Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) If below or equal
  • 39. Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) jump to
  • 40. Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) Call ‘morestack’
  • 41. Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB) And rerun the code
  • 43. Goroutines are “lightweight threads” What makes them lightweight? The fact that they’re “cheap” to create Stack sizes
  • 44. A thread needs a stack This is where it places: local variables, parameters and return values Stack sizes
  • 45. If stack size is too small It will limit the “depth” of your function calls Stack sizes
  • 46. If stack size is too high The overhead of each thread is higher Stack sizes
  • 47. Go takes a different approach Stack sizes
  • 48. Goroutine stacks start very small (4K vs 1M) Stack sizes
  • 49. But grow as needed (And also shrinks when it’s no longer needed) Stack sizes
  • 50. This means we need to check the stack size on each function call Stack sizes TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
  • 51. Some functions do not contain the “morestack” logic (E.g. Small “Leaf” functions since they can not require space beyond the existing stack) Stack sizes
  • 52. So what else makes Goroutines light? Goroutines
  • 53. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines Main goroutine Let’s look at some code…
  • 54. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines Create goroutines Main goroutine goroutine
  • 55. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines Main waits for goroutines to start Main goroutine goroutine
  • 56. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines Main signals goroutines to stop Main goroutine goroutine
  • 57. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines And waits for them Main goroutine goroutine
  • 58. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines How much progress will the Goroutine make before it stopped? (Atomic eqivalent of progress++) Main goroutine goroutine
  • 59. Go routine started: 0 All done. Progress: 2195 Total Time 82.036µs Goroutines 1 goroutine
  • 60. Go routine started: 1 Go routine started: 0 All done. Progress: 18353 Total Time 1.375069ms Goroutines 2 goroutines
  • 61. Go routine started: 2 Go routine started: 0 Go routine started: 1 All done. Progress: 558205 Total Time 10.147941ms Goroutines 3 goroutines
  • 63. Go routine started: 0 Go routine started: 1 Go routine started: 3 Go routine started: 2 Goroutines 4 goroutines
  • 64. Go routine started: 0 Go routine started: 1 Go routine started: 3 Go routine started: 2 Goroutines 4 goroutines Runs infinitely using 100% cpu* *On a 4 cpu machine
  • 65. Go routine started: 0 Go routine started: 1 Go routine started: 3 Go routine started: 2 Goroutines 4 goroutines Runs infinitely using 100% cpu* Why?
  • 66. Goroutines are scheduled by the Go runtime (Threads by the OS scheduler) Scheduler
  • 67. Goroutines are scheduled by the Go runtime (Threads by the OS scheduler) Goroutines are scheduled “cooperatively” (Threads are scheduled “preemptively”) Scheduler
  • 68. Goroutines need to check periodically whether their time is “up” When does it happen? Cooperative Scheduling
  • 69. I/O Channel reads Sleep CMPQ 0x10(CX), SP Cooperative Scheduling
  • 70. I/O Channel reads Sleep CMPQ 0x10(CX), SP Cooperative Scheduling Remember this one?
  • 71. Morestack is also used to switch goroutines Stacks TEXT embed/inherit.CallBasePrint(SB) /Users/eyalp/gocode/src/embed/inherit/tester.go tester.go:7 0x1087170 65488b0c25a008000 GS MOVQ GS:0x8a0, CX tester.go:7 0x1087179 483b6110 CMPQ 0x10(CX), SP tester.go:7 0x108717d 762f JBE 0x10871ae tester.go:7 0x108717f 4883ec18 SUBQ $0x18, SP tester.go:7 0x1087183 48896c2410 MOVQ BP, 0x10(SP) tester.go:7 0x1087188 488d6c2410 LEAQ 0x10(SP), BP tester.go:8 0x108718d e88effffff CALL embed/inherit.CreateBase(SB) tester.go:8 0x1087192 488b0424 MOVQ 0(SP), AX tester.go:8 0x1087196 4889442408 MOVQ AX, 0x8(SP) tester.go:9 0x108719b 48890424 MOVQ AX, 0(SP) tester.go:9 0x108719f e81cfeffff CALL embed/inherit.Base.Print(SB) tester.go:100x10871a4 488b6c2410 MOVQ 0x10(SP), BP tester.go:100x10871a9 4883c418 ADDQ $0x18, SP tester.go:100x10871ad c3 RET tester.go:7 0x10871ae e8fd2cfcff CALL runtime.morestack_noctxt(SB) tester.go:7 0x10871b3 ebbb JMP embed/inherit.CallBasePrint(SB)
  • 72. Let’s see how it works Cooperative Scheduling
  • 73. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) doSomething() } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Goroutines func doSomething() { doNothing() } func doNothing() { }
  • 74. Go routine started: 3 Go routine started: 0 Go routine started: 1 Go routine started: 2 All done. Progress: 2053541 Total Time 29.64113ms Goroutines 4 goroutines
  • 75. Probably not something you should think about, but still… Also will be interesting to see how this is affected by mid-stack inlining
  • 76. func startRace() { numOfGoRoutines := 1 progress := int64(0) done := int64(0) goRoutineStarted := sync.WaitGroup{} goRoutineStarted.Add(numOfGoRoutines) goRoutineFinished := sync.WaitGroup{} goRoutineFinished.Add(numOfGoRoutines) for i := 0; i < numOfGoRoutines; i++ { go func(id int) { fmt.Println("Go routine started:", id) goRoutineStarted.Done() for atomic.LoadInt64(&done) == 0 { atomic.AddInt64(&progress, 1) } goRoutineFinished.Done() }(i) } goRoutineStarted.Wait() atomic.StoreInt64(&done, 1) goRoutineFinished.Wait() fmt.Println("All done. Progress: ", progress) } Atomic vs Mutex A friend did a benchmark of atomic vs mutex Speaking of atomic...
  • 77. Atomic vs Mutex go func() { for j := 0; j < n; j++ { m.Lock() a += 1 m.Unlock() } }() VS go func() { for j := 0; j < n; j++ { atomic. AddInt64(&a, 1) } }()
  • 78. Atomic vs Mutex *You can find the complete benchmark here: https://github.com/BorisBorshevsky/GolangDemos/tree/master/demos/sync-pack/atomic Atomic is indeed faster..
  • 79. Atomic vs Mutex *You can find the complete benchmark here: https://github.com/BorisBorshevsky/GolangDemos/tree/master/demos/sync-pack/atomic But this also caught our attention
  • 80. Mutexes Let’s look at a simplified version of the test func mutexRun(n int) { m := &sync.Mutex{} wg := sync.WaitGroup{} wg.Add(n) for i := 0; i < n; i++ { go func() { for j := 0; j < 1000; j++ { m. Lock() m. Unlock() } wg. Done() }() } wg.Wait() } Create n goroutines
  • 81. Mutexes Let’s look at a simplified version of the test func mutexRun(n int) { m := &sync.Mutex{} wg := sync.WaitGroup{} wg.Add(n) for i := 0; i < n; i++ { go func() { for j := 0; j < 1000; j++ { m. Lock() m. Unlock() } wg. Done() }() } wg.Wait() } Loop 1,000 times
  • 82. Mutexes Let’s look at a simplified version of the test func mutexRun(n int) { m := &sync.Mutex{} wg := sync.WaitGroup{} wg.Add(n) for i := 0; i < n; i++ { go func() { for j := 0; j < 1000; j++ { m.Lock() m.Unlock() } wg. Done() }() } wg.Wait() } Just Lock & Unlock
  • 83. Mutexes Create benchmark tests func benchmarkMutex (c int, b *testing.B) { for n := 0; n < b.N; n++ { mutexRun(c) } } func BenchmarkMutex10 (b *testing.B) { benchmarkMutex (10, b) } func BenchmarkMutex100 (b *testing.B) { benchmarkMutex (100, b) } func BenchmarkMutex1000 (b *testing.B) { benchmarkMutex (1000, b) } func BenchmarkMutex10000 (b *testing.B) { benchmarkMutex (10000, b) } 10 goroutines VS 100 goroutines VS 1,000 goroutines VS 10,000 goroutines
  • 84. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op 10 goroutines # of loops
  • 85. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op 10 goroutines Avg time per op
  • 86. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op 10 goroutines Allocated bytes per op
  • 87. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op 10 goroutines # of allocations per op
  • 88. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op BenchmarkMutex100-8 100 11280653 ns/op 787 B/op 5 allocs/op 100 goroutines
  • 89. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op BenchmarkMutex100-8 100 11280653 ns/op 787 B/op 5 allocs/op BenchmarkMutex1000-8 10 110106350 ns/op 4372 B/op 41 allocs/op 1,000 goroutines
  • 90. Mutexes go test -bench=. -benchmem BenchmarkMutex10-8 2000 1105564 ns/op 64 B/op 2 allocs/op BenchmarkMutex100-8 100 11280653 ns/op 787 B/op 5 allocs/op BenchmarkMutex1000-8 10 110106350 ns/op 4372 B/op 41 allocs/op BenchmarkMutex10000-8 1 1106063700 ns/op 4981360 B/op 16107 allocs/op PASS ok play/mutex 5.836s 10,000 goroutines
  • 91. Mutexes BenchmarkMutex10000-8 1 1106063700 ns/op 4981360 B/op 16107 allocs/op Why? To understand why, Let’s first see how a mutex works
  • 93. Mutexes The basic* idea goes like this: Have a key in redis which represents the lock (e.g. MyLock1) Call “SETNX MyLock1 <SomeValue>” (Set If Not Exists) If SETNX returns 1 - lock acquired Otherwise sleep and try again until lock acquired
  • 94. Mutexes The basic* idea goes like this: Have a key in redis which represents the lock (e.g. MyLock1) Call “SETNX MyLock1 <SomeValue>” (Set If Not Exists) If SETNX returns 1 - lock acquired Otherwise sleep and try again until lock acquired *This is a simplified and incomplete implementation. See https://redis.io/commands/setnx for details on how to implement a Redis lock correctly
  • 95. Mutexes The closest to SETNX in Go is CAS: var MyLock1 int64 success := atomic.CompareAndSwapInt64(&MyLock1, 0, 1) I.e. But atomically if MyLock == 0 { MyLock = 1 }
  • 96. Mutexes type MyMutex struct { state int32 } func (m *MyMutex) Lock() { for !atomic.CompareAndSwapInt32(&m.state, 0, 1) { ... } } func (m *MyMutex) Unlock() { atomic.StoreInt32(&m.state, 0) } Simple mutex implementation: To Lock: Repeatedly try to set 1
  • 97. Mutexes type MyMutex struct { state int32 } func (m *MyMutex) Lock() { for !atomic.CompareAndSwapInt32 (&m.state, 0, 1) { ... } } func (m *MyMutex) Unlock() { atomic.StoreInt32(&m.state, 0) } Simple mutex implementation: To Unlock: Set to 0
  • 98. type MyMutex struct { state int32 } func (m *MyMutex) Lock() { for !atomic.CompareAndSwapInt32 (&m.state, 0, 1) { ... } } func (m *MyMutex) Unlock() { atomic.StoreInt32(&m.state, 0) } Mutexes Simple mutex implementation: What should we do here? Sleep? Nothing?
  • 99. Mutexes If we sleep, we lose precious time If we don’t sleep (spin), we consume CPU
  • 100. Mutexes So how are mutexes implemented?
  • 101. Mutexes First, try fast lock func (m *MyMutex) Lock() { if atomic.CompareAndSwapInt32(&m.state, 0, 1) { return } … }
  • 102. Mutexes Next, try spinning func (m *MyMutex) Lock() { … for { if canSpin(iter) { if atomic.CompareAndSwapInt32 (&m.state, 0, 1) { break; } iter++ doSpin() } } … } If we’re using multiple cores and there’s at least one other running goroutine and we didn’t spin too long (iter<X)
  • 103. Mutexes Add this goroutine to the “waitlist” for this mutex (This is where allocations come from) and go to sleep (park) until woken up by an Unlock Last func (m *MyMutex) Lock() { … runtime_SemacquireMutex(…) }
  • 104. Mutexes Mutexes are very fast when: - There’s no contention - Time spent within the lock is very short Mutex performance degrades when there is a lot of contention
  • 105. Mutexes You can profile contention with go test -mutexprofile=mutex.out
  • 107. Summary - Embedding - Mid stack inlining - Dynamic stack size - Cooperative scheduling - Mutexes