SlideShare a Scribd company logo
From prototype to production - The journey of re-designing SmartUp.io
From prototype
to production
The journey of re-designing SmartUp.io
● Extrovert geek and tech lover
● Joined SmartUp.io @ Q2 2016
● Adores Linux, OSS, is a general minimalist
● Mining the bits in the startup & product
mines, but been through the valleys of
outsourcing & consultancy scene as well
About Me - whoami
Mate Lang
CTO @ SmartUp.io
About SmartUp.io - Who we are
● Startup company with a mobile-first,
gamified, social micro-learning SaaS
○ Initial goal is to help entrepreneurs launch and
take their startup companies to success
without bothering VCs with the exact same
questions
○ Found out, we have a much wider use case:
reach, engage, train & inspire communities to
facilitate learning
● Got a lot of attention from clients and
investors
SmartUp.io - Motivation
● Deloitte predicts huge
disruption opportunity
in corporate learning
sector
SmartUp.io - Motivation
● Deloitte predicts huge
disruption opportunity
in corporate learning
sector
Company’s average
net promoter score
for their LMS?
SmartUp.io - Motivation
● Deloitte predicts huge
disruption opportunity
in corporate learning
sector
Company’s average
net promoter score
for their LMS?
-8
What is this talk about
● How shoud I (the geek) attack the problem domain to be successful?
● How did we decompose and redesign a “complex” (a.k.a messy) monolith into
maintainable microservices
● How did we cope NFRs like security and performance
Fact #1 - In software the only constant thing is change
Fact #1 - In software the only constant thing is change
Fact #2 - Geek is the new sexy ;)
Market driven evolution
● Initial product is simple media consumption
○ Users learn by consuming content and competing with each other
○ SmartUp content team writes content on protected administration webapp
○ Users are on the same platform, without isolation
● ACME Inc comes along with proposal to use the platform internally for learning
○ Users are isolated for ACME Inc and FooBar Inc (multi-tenancy)
○ Company needs to be able to write their own content on their “slice” of the platform
● Introduce Communities = isolated instance of our platform suitable to serve the
needs for business consumers
Version 1 (Wondeer)
Market driven evolution
What we had What we wanted to sell
But most importantly...
Market driven evolution
This is what we had
under the hood
Plan is to rewrite from scratch
Team’s estimation of backlog = ~ 7 months
Plan is to rewrite from scratch
Actual delivery time = 7 months 10 months
Version 2 (Brontocorn)
Cool :)
Let’s rewrite...
Where could it go
wrong?
Where could it go
wrong?
Right from architecting
From prototype to production - The journey of re-designing SmartUp.io
Understanding your domain
● Clients use a “smart hack” on V1 to obtain a predefined order of their published
learning material
● By default the platform facilitates “feed-like” behaviour
○ no explicit ordering
○ potentially infinite list of items
● The right question to ask as an engineer
Why do our client would ever
want that?
Understanding your domain
● Clients want a course like structure
● That is substantially different than our individual publishing model
○ The content is designed from the beginning in an ordered fashion
○ The completion should be in an ordered fashion
○ Analytics should be able to correlate between completion records
○ It deserves it’s own management
Understanding your domain
● Clients want a course like structure
● That is substantially different than our individual publishing model
○ The content is designed from the beginning in an ordered fashion
○ The completion should be in an ordered fashion
○ Analytics should be able to correlate between completion records
○ It deserves it’s own management
VS
Let’s get geeky
Geek team shop list for the re-write
● Scalable and maintainable codebase on all platforms
● Needs to be microservice architecture, because everything is easier with them
(this is a fat lie)
● Automate everything that is possible to automate
● Detailed and helpful documentation
● Needs to ship in a continuous fashion with Docker because Docker is cool (it
actually is)
● Needs to have Infrastructure-as-Code
● Needs to use managed solutions
● Web client and mobile apps useable by your grandma
Tech stack
Microservices - the why
Google Trends - search for term “microservice”
Microservices - the why
● Haters gonna hate, but there is undeniable interest & adoption in
engineering-led companies
● Microservices take common clean code (SOLID) concerns to system level
○ SRP - a service should be concerned about a single coherent domain
○ OCP - extending behavior done through encapsulating with higher level services.
Change in remote context should not produce change in a given service.
○ ISP - introducing edge services - specialized backends for clients
● The above seems like common sense, but engineers do fail in designing such
systems
Microservices - the reason we fail
“Often stepping back
you see more, don’t you?”
David Hockney
painter, draughtsman, printmaker, stage designer and photographer
Microservices - my fast service design test
● If you want to know whether you have (not) designed it correctly (false
positives may appear) fill in the following test
Test for your service design
1. Imagine you have to open source your service. Are you able to do so without
doing code changes, but staying useful to an engineer outside your business
domain? …………………………………
------------------------------ END OF TEST ------------------------------
Let’s see a concrete example
Meet the
“Leaderboard Service*”
*(the tech scene ran out of deity names to name services after,
so no Zeus or Hydra for you)
The leaderboard - proposition
● Service responsible to manage leaderboards
● A leaderboard is an ordered list of players associated with a score
● It needs to allow the near real-time update of such boards, based on individual
score change events
● Allows the player to check the “transaction history” not just the aggregated
state
● It allows the fast retrieval of a certain segment of the board
○ Top X
○ Around X ( X-10, X, X+10)
Something
like this...
But a bit more modern...
Like this
The leaderboard - collecting points
Card
Points
Card
The leaderboard - under the hood
public interface LeaderboardService {
LeaderboardCreationResponseDto createLeaderboard(LeaderboardCreationRequestDto requestDto);
LeaderboardDto retrieveTopNLeaderboard(String leaderboardId, long size);
LeaderboardDto retrieveAroundNLeaderboard(String leaderboardId, String playerId, long size);
PlayerDto retrievePlayer(String leaderboardId, String playerId);
void updatePlayerScore(String leaderboardId, String playerId, long score);
}
● Note that this interface is agnostic of all SmartUp related logic
● Could be reused in any situation where you want to represent entities in a
sorted order by score
● No matter what the player entity is, it’s created lazily when you first upgrade it’s
score
● Currently used to incentivize consumption, but can be applied to groups of
players, content creators, etc.
The leaderboard - under the hood
● Due to event sourcing we can always reconstruct state in case of failure
● Redis is in-memory. HA or not, should it ever go down, an update of state would
effectively restore our read-optimized model
● Our number of Score Processors scale with the data volume
● Concurrent state update correctness guaranteed by DynamoDB Stream shards
● In case of processor failure, upon service restoration the unprocessed events
would get picked up, all in a couple minutes (up to a couple weeks of staying
behind)
The leaderboard - the gain
The leaderboard
The leaderboard
QC: PASS
Another example...
Meet the Content Service
The Content Service - The proposition
A service that handles the creation and modification of versioned learning material.
Also enables versions to be instantiated and completed by consumers.
Encapsulates both structural and behavioural functionalities, like:
- Structural
- Question text
- Limited number of answers
- Solution explanation
- Behavioural
- Single-choice
- Multiple-choice
The Content Service - Under the hood
The Content Service - the gain
● Due to the Context being an abstract entity we can support lots of use cases for
consumption rules & resilient to change
○ Sharing of consumption record
○ You can re-do a content in certain circumstances (e.g. exam mode has separate context)
● Feedback from our Head Of Content after going live
“Fast, smooth and easy. And really fast. And damn, this thing's fast...”
● Design enables easy clean-up should we ever do so.
For now we store every change a content creator made.
“Because I can” - Dr. Bob Kelso, Scrubs
Tuning your
microservices, cause
Tuning your
microservices, cause
with high throughput
comes high latency
Tuning your microservices
● Simple Operation: Check users
credentials and request JWT Token
● Initial results: not too bad
● P95 responds in 701 ms
Tuning your microservices
● Scale it up
○ 1 OAuth Service
○ 2 User Services
● P95 in ~35k ms
● Almost all requests respond after 1
second
● 40% requests FAILED
THAT DOES NOT MAKE SENSE!
Tuning your microservices
● Found out there is no connection HTTP pooling -> TCP handshake penalty
● Update Spring Cloud to Edgware
● Set correct timeouts for Ribbon and Hystrix
● Reduce (yes, reduce) Tomcat resources
○ Max-Threads
■ The maximum number of request processing threads to be created
○ Max-Connections
■ The maximum number of connections that the server will accept and
process at any given time
○ Accept-Count
■ The maximum queue length for incoming connection requests when all
possible request processing threads are in use
Tuning your microservices
● 0 Failed Requests
● Mean Req/s: ~50
● P95 latency: 148 ms
Tuning your microservices
Sometimes
If microservices have
not solved your
engineering problems...
Try hiring a
“DevOps Engineer”
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
As per Wikipedia
DevOps =
a software engineering practice that aims at
unifying software development (Dev) and
software operation (Ops).
Automate
EVERYTHING!
Bronicorn Release
● Went live October 3, 0600 RO time
● Development environment was used for previous 10 months
● No other environment due to cost reasons
● On the day of release we created
○ Staging (Acceptance Testing) environment
○ Production
Bronicorn Release
● Went live October 3, 0600 RO time
● Development environment was used for previous 10 months
● No other environment due to cost reasons
● On the day of release we created
○ Staging (Acceptance Testing) environment
○ Production
45 minutes difference between
deployments (5 minutes active)
How is that possible?
How is that possible?
Infrastructure-as-Code
Infrastructure as Code
Definition
Infrastructure as code (IaC) is the process of managing and provisioning computer data
centers through machine-readable definition files, rather than physical hardware
configuration or interactive configuration tools. (as per Wikipedia)
● Just a Bunch of shell scripts
● Modern provisioning tools like: Chef, Puppet, Ansible
● Cloud-ready IoC management: Cloudformation, HashiCorp Terraform
Meet Terraform
● DSL based using HCL (HashiCorp Configuration Language)
● Module oriented
● Manages dependencies between resources (e.g. DNS depends on IP)
● Nice interpolation syntax
● Natively manages multiple environments through configuration (e.g. instance
types differ from env to env)
● Workflow = Plan > Review > Apply
● Configurable state backends
● From V0.10 supports pluggable “providers” (e.g. AWS, GCP)
Meet Terraform
Src: Terraform Homepage
Our way of Terraforming
● 4 AWS VPCs
○ Services VPC (For maintenance, and unified connection to other VPCs)
○ SmartUp VPCs (e.g. Dev, Stg, Prod)
● Each service owns its own module along with dependencies
Eg: Leaderboard:
○ Redis
○ Queues
○ DynamoDB Tables & Streams
○ etc
● Peering module to connect Services <-> SmartUp
● Using encrypted S3 for safe state storage
Managing service configuration
● Services pick up their configuration exclusively in runtime
● No mvn package -Pdev|stg|prod
● Build one artifact (docker) and use it everywhere
● Consul as service discovery and configuration storage
● Terraform injects properties into Consul upon execution
● No configuration done in YML
● Using Spring Cloud Config to pick these values up from Consul upon startup
and checking periodically for changes
Managing service configuration
Managing service configuration
Each team is responsible for
their delivery process
Each team is responsible for
their delivery process
From design to production
Let’s put it to production
● Loads of deploys in a geek’s life, better make it simple
● A good pipeline will
○ Provide fast feedback before PR integration (build, test & check infra dependencies)
○ Deploy ASAP changes to dev (fail-fast)
○ Streamline production releases so they prevent human error
○ Clear separation between steps
● Preferably define the whole pipeline using code
● Decided to use CircleCI - YAML based Workflows
Kudos to the team
Thank you for your
kind attention!
Mate Lang
CTO @ smartup.io
mate@smartup.io
twitter: @langmate
medium: @matelang

More Related Content

PDF
Architecturing the software stack at a small business
PPTX
Designing salesforce solutions for reuse - Josh Dennis
PPTX
WDS trainer presentation - MLOps.pptx
PPTX
Agile Governance for Hybrid Programs
PPTX
Info dev flexibility in agile
PDF
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
PPTX
Navigator Systems ltd HireTrack NX questions
PDF
Rapid app building with loopback framework
Architecturing the software stack at a small business
Designing salesforce solutions for reuse - Josh Dennis
WDS trainer presentation - MLOps.pptx
Agile Governance for Hybrid Programs
Info dev flexibility in agile
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
Navigator Systems ltd HireTrack NX questions
Rapid app building with loopback framework

Similar to From prototype to production - The journey of re-designing SmartUp.io (20)

PDF
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
PDF
Symantec - From Early Drupal Adoption to the Latest Drupal Innovations
PDF
Ahmed El Mawaziny CV
PPTX
Google cloud Study Jam 2023.pptx
PPTX
Deploying ML models in the enterprise
PDF
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
PPTX
Technology insights: Decision Science Platform
DOCX
VISWAPAVAN _2015_v1
PDF
How to Migrate Applications Off a Mainframe
PDF
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
PPT
IBM Bluemix Openwhisk
PPT
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
PPTX
Clean architecture
PPTX
Tales from the trenches creating complex distributed systems
PPTX
Concept of SOA
PDF
Multi-Agent Era will Define the Future of Software
DOC
Software Engineer
PPTX
Develop, deploy, and operate services at reddit scale oscon 2018
PPTX
MDOQ - Platform As A Service Agile Workflow Application for Magento - Launch ...
PDF
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Symantec - From Early Drupal Adoption to the Latest Drupal Innovations
Ahmed El Mawaziny CV
Google cloud Study Jam 2023.pptx
Deploying ML models in the enterprise
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Technology insights: Decision Science Platform
VISWAPAVAN _2015_v1
How to Migrate Applications Off a Mainframe
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
IBM Bluemix Openwhisk
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
Clean architecture
Tales from the trenches creating complex distributed systems
Concept of SOA
Multi-Agent Era will Define the Future of Software
Software Engineer
Develop, deploy, and operate services at reddit scale oscon 2018
MDOQ - Platform As A Service Agile Workflow Application for Magento - Launch ...
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
Ad

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
17 Powerful Integrations Your Next-Gen MLM Software Needs
PDF
AutoCAD Professional Crack 2025 With License Key
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Cost to Outsource Software Development in 2025
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PPTX
assetexplorer- product-overview - presentation
PDF
Website Design Services for Small Businesses.pdf
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Adobe Illustrator 28.6 Crack My Vision of Vector Design
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
17 Powerful Integrations Your Next-Gen MLM Software Needs
AutoCAD Professional Crack 2025 With License Key
wealthsignaloriginal-com-DS-text-... (1).pdf
CHAPTER 2 - PM Management and IT Context
Weekly report ppt - harsh dattuprasad patel.pptx
Cost to Outsource Software Development in 2025
Oracle Fusion HCM Cloud Demo for Beginners
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Digital Systems & Binary Numbers (comprehensive )
Monitoring Stack: Grafana, Loki & Promtail
assetexplorer- product-overview - presentation
Website Design Services for Small Businesses.pdf
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Advanced SystemCare Ultimate Crack + Portable (2025)
Reimagine Home Health with the Power of Agentic AI​
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Designing Intelligence for the Shop Floor.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Ad

From prototype to production - The journey of re-designing SmartUp.io

  • 2. From prototype to production The journey of re-designing SmartUp.io
  • 3. ● Extrovert geek and tech lover ● Joined SmartUp.io @ Q2 2016 ● Adores Linux, OSS, is a general minimalist ● Mining the bits in the startup & product mines, but been through the valleys of outsourcing & consultancy scene as well About Me - whoami Mate Lang CTO @ SmartUp.io
  • 4. About SmartUp.io - Who we are ● Startup company with a mobile-first, gamified, social micro-learning SaaS ○ Initial goal is to help entrepreneurs launch and take their startup companies to success without bothering VCs with the exact same questions ○ Found out, we have a much wider use case: reach, engage, train & inspire communities to facilitate learning ● Got a lot of attention from clients and investors
  • 5. SmartUp.io - Motivation ● Deloitte predicts huge disruption opportunity in corporate learning sector
  • 6. SmartUp.io - Motivation ● Deloitte predicts huge disruption opportunity in corporate learning sector Company’s average net promoter score for their LMS?
  • 7. SmartUp.io - Motivation ● Deloitte predicts huge disruption opportunity in corporate learning sector Company’s average net promoter score for their LMS? -8
  • 8. What is this talk about ● How shoud I (the geek) attack the problem domain to be successful? ● How did we decompose and redesign a “complex” (a.k.a messy) monolith into maintainable microservices ● How did we cope NFRs like security and performance
  • 9. Fact #1 - In software the only constant thing is change
  • 10. Fact #1 - In software the only constant thing is change Fact #2 - Geek is the new sexy ;)
  • 11. Market driven evolution ● Initial product is simple media consumption ○ Users learn by consuming content and competing with each other ○ SmartUp content team writes content on protected administration webapp ○ Users are on the same platform, without isolation ● ACME Inc comes along with proposal to use the platform internally for learning ○ Users are isolated for ACME Inc and FooBar Inc (multi-tenancy) ○ Company needs to be able to write their own content on their “slice” of the platform ● Introduce Communities = isolated instance of our platform suitable to serve the needs for business consumers
  • 13. Market driven evolution What we had What we wanted to sell
  • 15. Market driven evolution This is what we had under the hood
  • 16. Plan is to rewrite from scratch Team’s estimation of backlog = ~ 7 months
  • 17. Plan is to rewrite from scratch Actual delivery time = 7 months 10 months
  • 20. Where could it go wrong?
  • 21. Where could it go wrong? Right from architecting
  • 23. Understanding your domain ● Clients use a “smart hack” on V1 to obtain a predefined order of their published learning material ● By default the platform facilitates “feed-like” behaviour ○ no explicit ordering ○ potentially infinite list of items ● The right question to ask as an engineer Why do our client would ever want that?
  • 24. Understanding your domain ● Clients want a course like structure ● That is substantially different than our individual publishing model ○ The content is designed from the beginning in an ordered fashion ○ The completion should be in an ordered fashion ○ Analytics should be able to correlate between completion records ○ It deserves it’s own management
  • 25. Understanding your domain ● Clients want a course like structure ● That is substantially different than our individual publishing model ○ The content is designed from the beginning in an ordered fashion ○ The completion should be in an ordered fashion ○ Analytics should be able to correlate between completion records ○ It deserves it’s own management VS
  • 27. Geek team shop list for the re-write ● Scalable and maintainable codebase on all platforms ● Needs to be microservice architecture, because everything is easier with them (this is a fat lie) ● Automate everything that is possible to automate ● Detailed and helpful documentation ● Needs to ship in a continuous fashion with Docker because Docker is cool (it actually is) ● Needs to have Infrastructure-as-Code ● Needs to use managed solutions ● Web client and mobile apps useable by your grandma
  • 29. Microservices - the why Google Trends - search for term “microservice”
  • 30. Microservices - the why ● Haters gonna hate, but there is undeniable interest & adoption in engineering-led companies ● Microservices take common clean code (SOLID) concerns to system level ○ SRP - a service should be concerned about a single coherent domain ○ OCP - extending behavior done through encapsulating with higher level services. Change in remote context should not produce change in a given service. ○ ISP - introducing edge services - specialized backends for clients ● The above seems like common sense, but engineers do fail in designing such systems
  • 31. Microservices - the reason we fail “Often stepping back you see more, don’t you?” David Hockney painter, draughtsman, printmaker, stage designer and photographer
  • 32. Microservices - my fast service design test ● If you want to know whether you have (not) designed it correctly (false positives may appear) fill in the following test Test for your service design 1. Imagine you have to open source your service. Are you able to do so without doing code changes, but staying useful to an engineer outside your business domain? ………………………………… ------------------------------ END OF TEST ------------------------------
  • 33. Let’s see a concrete example
  • 34. Meet the “Leaderboard Service*” *(the tech scene ran out of deity names to name services after, so no Zeus or Hydra for you)
  • 35. The leaderboard - proposition ● Service responsible to manage leaderboards ● A leaderboard is an ordered list of players associated with a score ● It needs to allow the near real-time update of such boards, based on individual score change events ● Allows the player to check the “transaction history” not just the aggregated state ● It allows the fast retrieval of a certain segment of the board ○ Top X ○ Around X ( X-10, X, X+10)
  • 37. But a bit more modern...
  • 39. The leaderboard - collecting points Card Points Card
  • 40. The leaderboard - under the hood public interface LeaderboardService { LeaderboardCreationResponseDto createLeaderboard(LeaderboardCreationRequestDto requestDto); LeaderboardDto retrieveTopNLeaderboard(String leaderboardId, long size); LeaderboardDto retrieveAroundNLeaderboard(String leaderboardId, String playerId, long size); PlayerDto retrievePlayer(String leaderboardId, String playerId); void updatePlayerScore(String leaderboardId, String playerId, long score); } ● Note that this interface is agnostic of all SmartUp related logic ● Could be reused in any situation where you want to represent entities in a sorted order by score ● No matter what the player entity is, it’s created lazily when you first upgrade it’s score ● Currently used to incentivize consumption, but can be applied to groups of players, content creators, etc.
  • 41. The leaderboard - under the hood
  • 42. ● Due to event sourcing we can always reconstruct state in case of failure ● Redis is in-memory. HA or not, should it ever go down, an update of state would effectively restore our read-optimized model ● Our number of Score Processors scale with the data volume ● Concurrent state update correctness guaranteed by DynamoDB Stream shards ● In case of processor failure, upon service restoration the unprocessed events would get picked up, all in a couple minutes (up to a couple weeks of staying behind) The leaderboard - the gain
  • 46. Meet the Content Service
  • 47. The Content Service - The proposition A service that handles the creation and modification of versioned learning material. Also enables versions to be instantiated and completed by consumers. Encapsulates both structural and behavioural functionalities, like: - Structural - Question text - Limited number of answers - Solution explanation - Behavioural - Single-choice - Multiple-choice
  • 48. The Content Service - Under the hood
  • 49. The Content Service - the gain ● Due to the Context being an abstract entity we can support lots of use cases for consumption rules & resilient to change ○ Sharing of consumption record ○ You can re-do a content in certain circumstances (e.g. exam mode has separate context) ● Feedback from our Head Of Content after going live “Fast, smooth and easy. And really fast. And damn, this thing's fast...” ● Design enables easy clean-up should we ever do so. For now we store every change a content creator made. “Because I can” - Dr. Bob Kelso, Scrubs
  • 51. Tuning your microservices, cause with high throughput comes high latency
  • 52. Tuning your microservices ● Simple Operation: Check users credentials and request JWT Token ● Initial results: not too bad ● P95 responds in 701 ms
  • 53. Tuning your microservices ● Scale it up ○ 1 OAuth Service ○ 2 User Services ● P95 in ~35k ms ● Almost all requests respond after 1 second ● 40% requests FAILED
  • 54. THAT DOES NOT MAKE SENSE!
  • 55. Tuning your microservices ● Found out there is no connection HTTP pooling -> TCP handshake penalty ● Update Spring Cloud to Edgware ● Set correct timeouts for Ribbon and Hystrix ● Reduce (yes, reduce) Tomcat resources ○ Max-Threads ■ The maximum number of request processing threads to be created ○ Max-Connections ■ The maximum number of connections that the server will accept and process at any given time ○ Accept-Count ■ The maximum queue length for incoming connection requests when all possible request processing threads are in use
  • 56. Tuning your microservices ● 0 Failed Requests ● Mean Req/s: ~50 ● P95 latency: 148 ms
  • 59. If microservices have not solved your engineering problems...
  • 60. Try hiring a “DevOps Engineer”
  • 63. As per Wikipedia DevOps = a software engineering practice that aims at unifying software development (Dev) and software operation (Ops).
  • 65. Bronicorn Release ● Went live October 3, 0600 RO time ● Development environment was used for previous 10 months ● No other environment due to cost reasons ● On the day of release we created ○ Staging (Acceptance Testing) environment ○ Production
  • 66. Bronicorn Release ● Went live October 3, 0600 RO time ● Development environment was used for previous 10 months ● No other environment due to cost reasons ● On the day of release we created ○ Staging (Acceptance Testing) environment ○ Production 45 minutes difference between deployments (5 minutes active)
  • 67. How is that possible?
  • 68. How is that possible? Infrastructure-as-Code
  • 69. Infrastructure as Code Definition Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. (as per Wikipedia) ● Just a Bunch of shell scripts ● Modern provisioning tools like: Chef, Puppet, Ansible ● Cloud-ready IoC management: Cloudformation, HashiCorp Terraform
  • 70. Meet Terraform ● DSL based using HCL (HashiCorp Configuration Language) ● Module oriented ● Manages dependencies between resources (e.g. DNS depends on IP) ● Nice interpolation syntax ● Natively manages multiple environments through configuration (e.g. instance types differ from env to env) ● Workflow = Plan > Review > Apply ● Configurable state backends ● From V0.10 supports pluggable “providers” (e.g. AWS, GCP)
  • 72. Our way of Terraforming ● 4 AWS VPCs ○ Services VPC (For maintenance, and unified connection to other VPCs) ○ SmartUp VPCs (e.g. Dev, Stg, Prod) ● Each service owns its own module along with dependencies Eg: Leaderboard: ○ Redis ○ Queues ○ DynamoDB Tables & Streams ○ etc ● Peering module to connect Services <-> SmartUp ● Using encrypted S3 for safe state storage
  • 73. Managing service configuration ● Services pick up their configuration exclusively in runtime ● No mvn package -Pdev|stg|prod ● Build one artifact (docker) and use it everywhere ● Consul as service discovery and configuration storage ● Terraform injects properties into Consul upon execution ● No configuration done in YML ● Using Spring Cloud Config to pick these values up from Consul upon startup and checking periodically for changes
  • 76. Each team is responsible for their delivery process
  • 77. Each team is responsible for their delivery process From design to production
  • 78. Let’s put it to production ● Loads of deploys in a geek’s life, better make it simple ● A good pipeline will ○ Provide fast feedback before PR integration (build, test & check infra dependencies) ○ Deploy ASAP changes to dev (fail-fast) ○ Streamline production releases so they prevent human error ○ Clear separation between steps ● Preferably define the whole pipeline using code ● Decided to use CircleCI - YAML based Workflows
  • 79. Kudos to the team
  • 80. Thank you for your kind attention! Mate Lang CTO @ smartup.io mate@smartup.io twitter: @langmate medium: @matelang