SlideShare a Scribd company logo
Being a Software Engineer at
Facebook
13 November 2013
Food
Perks
Culture
Bootcamp
Scale
In total:

Every DAY:
∗400,000 net new users
∗350 million photos
∗5 billion shares
∗10 billion messages

∗1.2 billion users
∗150 billion friendships
∗250 billion photos
∗1 trillion likes
∗16% of all time spent on
Internet
∗1 million users per engineer
UIs
This is your fault
Impact
Impact
Scope
∗ Machine learning
∗ Big Data
∗ Search and information
retrieval
∗ Performance
∗ Hardware
∗ Network
∗ Human-computer
interaction

∗
∗
∗
∗
∗
∗
∗
∗

Web UI
Mobile UI
Static analysis
Compilers
Virtual machines
Image processing
Video processing
Datacentre design
Fixing a bug

tldr: We've found the source for the File Descriptor leaks and we have potential fixes to address the problem.
Taking a deeper look at the problem, in order to verify what the real fixes could be, we need to determine the actual destination of where the file descriptors are pointing to. In adb shell you can do
so by cd-ing into /proc/<pid>fd and then ls -l to show you the actual destination...
A quick glance at the result set reveals that pipes and /dev/ashmem occupies the majority of the open fds, since both items take up almost 50% of all the open fds, these two items are ideal
candidates to figure out the fd leaking issue.
In short, ashmem stands for Android Shared Memory, and it is used by the Android system to facilitates memory sharing across all processes. Each ashmem registers a shrinker and the shrinker
would reclaim the memory when the device is in low memory state, just like jvm, but in the native space. As for pipe in unix world, it is an interprocess channel that places two file descriptors, one
for reading and one for writing.
My first task is see why the number of pipe fds are building up upon scrolling in newsfeed. When FB4A first started we have around xx open pipes. Scrolling through couple pages will grow the
number… To isolate the problem from fb4a, I built the fbsimple app which only contains the newsfeed module and I observed the same behavior. To further isolate the problem, I then turned off
image fetching/prefetching to see if the problem is correlated to the image fetching pipeline. Surprisingly, without image fetching I can still see the problem, and I am convinced that the problem
affects more than just the image pipeline. My next experiment disabled newsfeed database caching and the same problem persisted, which rules out db access as the main cause of the problem.
The only thing left to do is to play around with the network executor. On FB4A by default, we use the HttpClient from Apache to execute all the network requests. Earlier last month, we introduced
the SPDY library okHttp as an experiment to replace Apache HttpClient. A quick test reveals that Apache HttpClient is indeed the culprit for the leaking pipes - with the same configuration, okHttp
keeps the open fd pipe to around 20 versus 90 with Apache HttpClient. Not only okHttp is better at reusing network connection, it also has better fd management. A sanity check with okhttp
enabled in FB4A reveals the same result.
Ashmem debugging is rather straight forward - Ashmem is allocated when image fetch is enabled. A deep dive reveals that the FD is only allocated after the bitmap decoding has been called, and I
suspected that ashmem has to do with purgeability. To verify, I disabled image cache and instead of relying on a disk file to decode the image, I passed in the http content inputstream directly and
used BitmapFactory.decodeStream to decode the image. With this I can confirmed that we are not longer allocating ashmem for decoding because the images are no longer purgeable and lives in
the java heap space. However, we ran into the same memory problem with byte decoding experiment, and big images would be black or partially decoded on fb4a. So instead of decoding every
images with the stream based approach, I made a quick prototype to have big images (images from single photo stories, multi photos collage) to render with our existing solution and have small
images like profile pictures to be rendered with stream and the result looks promising. Scrolling through the list of 1000 people in the flyout now would not grow the number of open fds. I think this
hybrid approach would work.
With the combined approach as stated above, FB4A now stays around half the open fds.
Values
Questions?
∗ Engineer: tnicholas@fb.com
∗ Recruiter (£££): ruth@fb.com

More Related Content

PPTX
Presentation on Fresco
PPTX
Saving a century a day: how the Fresco library works
PPTX
Facebook tricks for image handling in Android
PDF
Achieving Scale with HoloLens and BIM: Designing for interactions with large ...
PPTX
WebVR, not just Holograms in the web but powerful platform
PPTX
Making Windows Reasonable
PDF
Thinking cpu & memory - DroidCon Paris 18 june 2013
PDF
Adobe AIR for mobile games
Presentation on Fresco
Saving a century a day: how the Fresco library works
Facebook tricks for image handling in Android
Achieving Scale with HoloLens and BIM: Designing for interactions with large ...
WebVR, not just Holograms in the web but powerful platform
Making Windows Reasonable
Thinking cpu & memory - DroidCon Paris 18 june 2013
Adobe AIR for mobile games

What's hot (16)

PDF
A preview of Feathers 2.2 and the Feathers SDK
PDF
JavaScript all the things! - FullStack 2017
PPTX
Wp7 performance challenges
PDF
Building desktop applications with web technologies - ELECTRON the easy way
DOCX
Production Schedule
PPTX
Full stack development in Go
PDF
From React to React Native
ODP
Turning Plone into a dynamic site factory
KEY
Seattle javascript game development - Overview
PPTX
HTML5DevConf - Unleash the power of 3D with babylon.js
PPTX
Boulder JS meet up presentation for April 16
PDF
Google I/O 2016 Recap
PPT
Coffee script throwdown
PDF
Marmalade: bittersweet experience
PPTX
Microsoft Breeze CA AI Workshop
A preview of Feathers 2.2 and the Feathers SDK
JavaScript all the things! - FullStack 2017
Wp7 performance challenges
Building desktop applications with web technologies - ELECTRON the easy way
Production Schedule
Full stack development in Go
From React to React Native
Turning Plone into a dynamic site factory
Seattle javascript game development - Overview
HTML5DevConf - Unleash the power of 3D with babylon.js
Boulder JS meet up presentation for April 16
Google I/O 2016 Recap
Coffee script throwdown
Marmalade: bittersweet experience
Microsoft Breeze CA AI Workshop
Ad

Similar to Being a Software Engineer at Facebook (20)

PDF
Trusting files (and their formats)
PPT
A tale of two proxies
PDF
Building with JavaScript - write less by using the right tools
PDF
facebook architecture for 600M users
DOCX
It questions
ODP
Even internet computers want to be free: Using Linux and open source software...
PDF
DataDay 2023 Presentation - Notes
PDF
React Conf 17 Recap
PDF
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
PDF
What every C++ programmer should know about modern compilers (w/ comments, AC...
PDF
Using MongoDB and a Relational Database at MongoDB Day
PPT
2010 Sopac Cosugi
ODP
Building Secure Open & Distributed Social Networks
PPT
3stages Wdn08 V3
PDF
Caring for file formats
PDF
Prophet - Beijing Perl Workshop
PDF
Making it Work Offline: Current & Future Offline APIs for Web Apps
PDF
xkcd viewer report
PPTX
Ruby in the Browser - RubyConf 2011
PPT
Design Reviewing The Web
Trusting files (and their formats)
A tale of two proxies
Building with JavaScript - write less by using the right tools
facebook architecture for 600M users
It questions
Even internet computers want to be free: Using Linux and open source software...
DataDay 2023 Presentation - Notes
React Conf 17 Recap
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
What every C++ programmer should know about modern compilers (w/ comments, AC...
Using MongoDB and a Relational Database at MongoDB Day
2010 Sopac Cosugi
Building Secure Open & Distributed Social Networks
3stages Wdn08 V3
Caring for file formats
Prophet - Beijing Perl Workshop
Making it Work Offline: Current & Future Offline APIs for Web Apps
xkcd viewer report
Ruby in the Browser - RubyConf 2011
Design Reviewing The Web
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
August Patch Tuesday
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Machine Learning_overview_presentation.pptx
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative study of natural language inference in Swahili using monolingua...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Getting Started with Data Integration: FME Form 101
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
August Patch Tuesday
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A comparative analysis of optical character recognition models for extracting...
Machine Learning_overview_presentation.pptx
TLE Review Electricity (Electricity).pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
OMC Textile Division Presentation 2021.pptx
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Being a Software Engineer at Facebook

  • 1. Being a Software Engineer at Facebook 13 November 2013
  • 6. Scale In total: Every DAY: ∗400,000 net new users ∗350 million photos ∗5 billion shares ∗10 billion messages ∗1.2 billion users ∗150 billion friendships ∗250 billion photos ∗1 trillion likes ∗16% of all time spent on Internet ∗1 million users per engineer
  • 7. UIs
  • 8. This is your fault
  • 11. Scope ∗ Machine learning ∗ Big Data ∗ Search and information retrieval ∗ Performance ∗ Hardware ∗ Network ∗ Human-computer interaction ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Web UI Mobile UI Static analysis Compilers Virtual machines Image processing Video processing Datacentre design
  • 12. Fixing a bug tldr: We've found the source for the File Descriptor leaks and we have potential fixes to address the problem. Taking a deeper look at the problem, in order to verify what the real fixes could be, we need to determine the actual destination of where the file descriptors are pointing to. In adb shell you can do so by cd-ing into /proc/<pid>fd and then ls -l to show you the actual destination... A quick glance at the result set reveals that pipes and /dev/ashmem occupies the majority of the open fds, since both items take up almost 50% of all the open fds, these two items are ideal candidates to figure out the fd leaking issue. In short, ashmem stands for Android Shared Memory, and it is used by the Android system to facilitates memory sharing across all processes. Each ashmem registers a shrinker and the shrinker would reclaim the memory when the device is in low memory state, just like jvm, but in the native space. As for pipe in unix world, it is an interprocess channel that places two file descriptors, one for reading and one for writing. My first task is see why the number of pipe fds are building up upon scrolling in newsfeed. When FB4A first started we have around xx open pipes. Scrolling through couple pages will grow the number… To isolate the problem from fb4a, I built the fbsimple app which only contains the newsfeed module and I observed the same behavior. To further isolate the problem, I then turned off image fetching/prefetching to see if the problem is correlated to the image fetching pipeline. Surprisingly, without image fetching I can still see the problem, and I am convinced that the problem affects more than just the image pipeline. My next experiment disabled newsfeed database caching and the same problem persisted, which rules out db access as the main cause of the problem. The only thing left to do is to play around with the network executor. On FB4A by default, we use the HttpClient from Apache to execute all the network requests. Earlier last month, we introduced the SPDY library okHttp as an experiment to replace Apache HttpClient. A quick test reveals that Apache HttpClient is indeed the culprit for the leaking pipes - with the same configuration, okHttp keeps the open fd pipe to around 20 versus 90 with Apache HttpClient. Not only okHttp is better at reusing network connection, it also has better fd management. A sanity check with okhttp enabled in FB4A reveals the same result. Ashmem debugging is rather straight forward - Ashmem is allocated when image fetch is enabled. A deep dive reveals that the FD is only allocated after the bitmap decoding has been called, and I suspected that ashmem has to do with purgeability. To verify, I disabled image cache and instead of relying on a disk file to decode the image, I passed in the http content inputstream directly and used BitmapFactory.decodeStream to decode the image. With this I can confirmed that we are not longer allocating ashmem for decoding because the images are no longer purgeable and lives in the java heap space. However, we ran into the same memory problem with byte decoding experiment, and big images would be black or partially decoded on fb4a. So instead of decoding every images with the stream based approach, I made a quick prototype to have big images (images from single photo stories, multi photos collage) to render with our existing solution and have small images like profile pictures to be rendered with stream and the result looks promising. Scrolling through the list of 1000 people in the flyout now would not grow the number of open fds. I think this hybrid approach would work. With the combined approach as stated above, FB4A now stays around half the open fds.
  • 14. Questions? ∗ Engineer: tnicholas@fb.com ∗ Recruiter (£££): ruth@fb.com