SlideShare a Scribd company logo
Previously worked in 
Lufthansa, NASA, Intel 
Running, biking, 
paragliding 
Travelling 
Photography 
Filip Rogaczewski • frogaczewski@atlassian.com • 
Spartez/Atlassian 
ETI graduate 
Team leader in Spartez
BUSINESS APPLICATIONS 
INTEGRATION IN THE CLOUD: 
HOW TO INTEGRATE 50 000+ 
SERVERS TOGETHER
WHY 
CASE STUDIES 
Agenda 
HOW 
UI INTEGRATION 
OPPORTUNITY REST API 
MESSAGING 
MULTI-TENANCY 
DEPLOYMENT
WHY 
Case study: Facebook Recommended 
Friends 
feed 
Chat 
Activity 
Stream Applications 
Chat
Many distinct 
services 
integrated into a 
single application
WHY 
Service Oriented Architecture 
SOAP (simple object access protocol) 
XML RPC (remote procedure call) 
CORBA 
RMI (remote method invocation)
SOA: 
Loosely coupled & 
independently 
working services
WHY 
Service Oriented Architecture 
Scales the application 
• Loosely coupled services 
• Less resource restrictions for services 
• Communication with well defined API 
• Allows better technological choice for services 
• Distinct deployment models 
Service 
Service 
CONTAINER 
Integration HTTP
WHY 
Service Oriented Architecture 
Different hardware stack for services in Facebook 
Type I 
Web 
Type III 
DB 
Type IV 
Hadoop 
Type V 
Haystack 
Type VI 
Cache 
Type VII 
Cold storage 
CPU (2) Xeon 
E5-2670 
(2) Xeon 
E5-2660 
(2) Xeon 
E5-2660 
(2) Xeon 
E5-2660 
(2) Xeon 
E5-2660 
(2) Xeon 
E5-2660 
Memory 16GB 144 GB 64 GB 96 GB 144 GB 144 GB 
Disk (1) 500 GB 
SATA 
3.2TB PCI 
Flash (15) 4TB SAS (30) 4TB SAS (1) 2 TB 
SATA 
(240) 4TB 
SATA
Problems faced by 
Facebook today, 
are our problems 
in few years
WHY 
Service Oriented Architecture 
More effective organisation 
• Each team running a single service. 
• Each team is cross-functional (designers, product managers, 
testers, developers, ops-engineers). 
• Decision about roadmap happen locally. 
• Geographically collocated teams, one service in USA, second 
service in Australia, third in Poland. 
• Easier to scale work, multiple teams working at the same 
time.
What is the 
alternative?
WHY 
In Process Integration 
CONTAINER 
Add-On 
In Process 
• Resources are shared 
• Access to all data 
• Doesn’t scale 
Tied to the stack 
• Language 
• Frameworks 
Add-On No clear API boundaries
Who else does 
integration?
WHY 
Spotify 
Each item is distinct service 
Music stream 
Friends feed 
Browse music 
service
WHY 
Atlassian: JIRA 
Bitbucket 
Attachments 
Confluence Hipchat 
JIRA Agile
Internal 
application 
composition. 
Why else?
WHY 
Integrations of multiple applications 
You can sell all your products instead of one.
WHY 
Extending with marketplace 
Customers always want more features. 
If you can’t give it to them, let someone else do this - marketplace. 
Cash 25% of what external vendors sold using your marketplace.
30 000 000 
$/year
WHY 
Enterprise customers 
Customers who want to integrate your product with their existing 
applications 
HR 
Communi 
cation 
Environm 
ent 
CRM 
Asset 
manageme 
nt 
Supply 
GRC chain 
Finance
WHY 
Acquisitions 
You buy next fantastic company. 
You want to quickly integrate this feature. 
Can take couple of months if you have an integration layer ready. 
Might never be done, if you don’t. 
???
CASE STUDIES 
HOW 
Agenda 
WHY 
UI INTEGRATION 
OPPORTUNITY REST API 
MESSAGING 
MULTI-TENANCY 
DEPLOYMENT
UI integration
HOW 
How to embed external HTML here?
HOW 
Iframe 
Never embed HTML from external sites. 
When using iframes, browser provides security: 
• Don’t set sandboxing to allow-forms, allow-scripts, allow-same- 
origin, allow-top-navigation. This is a security model 
very difficult to manage. 
Sign the URL so server rendering content can authenticate the 
request. 
Optionally pass context parameters. 
Use CORS or postMessage for communication. 
Performance issues.
Security
HOW 
Security: How to verify this request? 
https://whoslooking-stg.herokuapp.com/poller?issue_key=ACJIRA-157 
&tz=Australia%2FSydney 
&loc=en-US 
&user_id=frogaczewski 
&user_key=frogaczewski 
&xdm_e=https%3A%2F%2Fecosystem.atlassian.net&xdm_c=channel-whoslooking-connect-stg__ 
whos-looking&cp=&lic=none 
&jwt= 
eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJmcm 
9nYWN6ZXdza2kiLCJxc2giOiJiZjA1NmU5MjEzYjBkODIyNDA 
wNzg4YmQ4MThhNDk4YmM0NGQ0OTMyYTM2MWU1Mjk1Zj 
cwMTczOGRiMGRjOTA2IiwiaXNzIjoiamlyYTo1OTk3NWQ2Ny 
00Y2EwLTRlOWUtOTk2MC1kMWFhYWU3NmJiMzkiLCJleHA 
iOjE0MTMxMzI2NTksImlhdCI6MTQxMzEzMjQ3OX0.Da8VXjL 
_9z5xyzErtaJohHKH-xx-0Rp-9MF_xtIvcaY
HOW 
Security: URL signing requirements 
1. Signature for validation who created the request. 
2. Issuer: identify the application instance which issued the 
request. Is this jiraForEti or is this jiraForGdanskUniversity? 
3. Expiration time of the token. Time in UTC after which you 
should no longer accept the token. 
4. Query hash. Prevents URL tampering. 
5. Id of the user for authorisation. 
6. Algorithm used to sign the URL.
HOW 
Security: Signature validation 
1. Token has the following form: 
2. Upon installation host and service exchange a shared secret. 
3. Service receives a public key of the host. Service have to verify 
the public key. Each service expose REST API for public key 
retrieval. 
4. During request service extracts the issuer and signature 
algorithm from the URL and retrieves the sharedSecret for the 
issuer. 
5. Service signs encodedHeader.encodedClaims with algorithm 
from the header and verifies if the signatures match. If yes, return 
content. If no, return 403 (forbidden).
IFRAME AND 
PARENT 
COMMUNICATION
HOW 
Sandboxing 
An iframe instance whose parent and child reside on different 
domains or hostnames constitutes a sandboxed environment. The 
contained page has no access to its parent. These restrictions are 
imposed by the browser's same origin policy. 
There are a few limitations applicable to iframes: 
• Stylesheet properties from the parent do not cascade to the 
child page 
• Child pages have no access to its parent's DOM and JavaScript 
properties 
• Likewise, the parent has no access to its child's DOM or 
JavaScript properties.
HOW 
Cross origin resource sharing (CORS) 
1. Keep the list of whitelisted URL with services allowed to access 
server resources. 
2. When executing cross-origin request, the browser header: 
Origin: http://service.atlassian.net 
3. If the service is whitelisted, server should return: 
Access-Control-Allow-Origin: http://service.atlassian.net 
DO NOT USE JSONP 
4. Multiple headers for: 
choosing a subset of allowed headers 
(Access-Control-Allow-Headers) 
choosing a subset of allowed HTTP methods 
(Access-Control-Allow-Methods)
HOW 
window.postMessage 
1. Create clear JS API between parent and iframe. 
2. Parent creates an event listener for a message. 
window.addEventListener("message", executeXHR, false); 
3. Client executes: 
window.parent.postMessage(“request", 
JSON.stringify({url: ‘/rest/api/2/dashboard’, 
success: function() { alert(“1”);}} 
) 
4. Parent executes the request on behalf of the child and 
postMessage the results. 
5. Difficult to implement. Host should provide a library with 
abstraction over JS functions it can handle.
Performance
HOW 
Performance: Apdex 
New relic: measuring user satisfaction 
• In Atlassian 
• Satisfied 1s 
• Tolerating 3s 
• Our Apdex goal is 0.9 
• Apdex between 0.85 to 0.93 
is considered to be a good 
score. 
• For business applications 
users are more tolerant then 
for customer applications 
• Financial services are out of 
scope.
HOW 
Performance: Latency 
1. Latency 
Within California? 
Within Europe? 
Across Atlantic? 
US to Australia? 
EMEA to Asia Pacific? 
2. Response times of the application is different in various 
geographical regions. The customer in US will usually have much 
better performance then the one in Europe. 
3. Use CDN for caching of static resource (akamai, cloudfront, 
edgecast) 
4. There are enterprise class solutions reducing latency (Verizon 
Enterprise Solutions) 
30 ms 
30 ms 
90 ms 
210 ms 
250 ms
HOW 
Performance: iframe request 
Page containing an iframe
HOW 
Performance: iframe request 
Page containing non-iframe embedded content
REST API
HOW 
How do I change this data?
WHY 
REST API 
Representational state transfer. 
API is Application Programming Interface. 
For API to make sense, it needs to be stable. Each service needs 
an API policy. 
Unless the REST API creates security risk, it can’t change without 
a previous notice (deprecation period) when services can start 
using a valid replacement or announce a end of life for a feature. 
Unfortunately, errors are also API. Bad return codes can’t change 
for instance. 
API should be versioned. Don’t change current API, release a new one. 
“Be liberal with what you accept, be consistent with what you 
return” 
Be precise with accepted and returned content-type.
WHY 
GET method 
rest/api/issue/ should return all issues? 
NO. Collections should always be paginated. Returning everything is 
never realistic in large systems. 
rest/api/issue/ACJIRA-1 should return a details of a particular issue. 
NOT all of them. Let user define as query parameter fields which 
should be returned. You are loosing precious CPU cycles and 
network bandwidth for returning everything. 
rest/api/issue/ACJIRA-1 should return ETag 
ETag header in response for GET: 
“ETag: xyz” 
Second request with header: 
”If-None-Match: xyz” 
304 when not modified, OK when changed with new ETag. Or not found.
WHY 
HATEOS 
rest/api/issue/ACJIRA-1/delete is not a valid GET usage. 
Use HATEOAS (Hypertext As The Engine Of Application State) 
{ 
"href": "rest/api/issue/ACJIRA-1", 
"rel": "self", 
"method": "GET" 
}, 
{ 
"href": "rest/api/issue", 
"rel": "all-paginated", 
"method": "GET" 
}, 
{ 
"href": "rest/api/issue", 
"rel": "create", 
"method": "POST" 
} 
{ 
"href": "rest/api/issue/ACJIRA-1", 
"rel": "update", 
"method": "PUT" 
}, 
{ 
"href": "rest/api/issue/ACJIRA-1", 
"rel": "delete", 
"method": "DELETE" 
}, 
{ 
"href": "rest/api/issue/ACJIRA-1", 
"rel": “partial-update", 
"method": "PATCH" 
} 
idempotent 
idempotent 
not idempotent 
idempotent 
idempotent 
not idempotent
WHY 
REST API security 
Prefer the same mechanism as for UI authentication 
Possible to use BasicAuth, OAuth, but only with SSL/TLS. 
Always check permissions of the user. 
Interesting problem to solve? 
We have a project ACJIRA and user Filip who can’t access the 
project. What return code shall he get? 
It should be 404 (not found) 
403 (forbidden) reveals that the project exists. Projects are often 
named after the company name for which the service is provided. 
Companies may disagree to publicly acknowledge relationship with 
another company.
WHY 
AaaS (API as a Service) 
You don’t need to write all APIs yourself. You can integrate with 
existing APIs. 
APIs directories/marketplaces where you can buy APIs. 
Be careful with passing the user data to external services.
Messaging
HOW 
How do I know about data change? 
CI server doesn’t execute PUT request /issue/ACJIRA-27 build 
completed. How would it know who is interested? 
It publishes information that the build was completed, jira-build-monitor-service 
registers a listener for this information.
HOW 
Messaging 
There are many approaches and concepts around messaging. 
The key differentiator is message delivery guarantee. 
It is easy to have 90% or 95% message delivery guarantee. 
Assuring 100% message delivery is almost impossible. It may 
require complete service rewrite. 
It is very important to understand the use case to make a decision 
what is the expected message delivery. 
Send messages asynchronously. Connections are precious 
resources for your service. 
Messages are API as well. They should have a clear contract and 
deprecation policy. Make them granular. 
Specify the content type. Be careful with content-length, too long 
may DOS the receiver. 
Sign the request.
HOW 
What can go wrong? 
Server dies during a change. 
Event sourcing - record each change in a database. If server died, 
there is no change to message. Each change have a sequence 
number. 
Database trigger. Move the message to a queue. What if database 
server dies? 
Resend with a possible duplicate flag. Is the order preserved? Who 
is controlling this? What if the controlling node of publisher dies? 
Server died after change, before sending the message. 
What if the message was not delivered? 
Server died during processing the message? 
Pull the message again with REST request to publisher. Parametrise 
the request with last successfully processed message. 
Use some Queue Service implementation acting as a proxy. Amazon 
SQS for instance.
HOW 
Eventually consistent 
It costs a lot of money to provide 
message guarantee (implement all the 
steps from previous slide). 
Most business applications can life 
without reliable messaging for a while. 
When running 52 000 servers or more (it 
will always be more), you need to 
acknowledge that things are going fail and 
messages are not going to be delivered. 
Apply resilient architecture, which polls for 
data change (event sourcing again) if the 
messages are not delivered.
MULTI-TENANCY
HOW 
How do I ensure I display proper data? 
I want to display information about related pages owned only by this 
customer. 
I want to display information only about source code changes made by 
organisation of my current customer.
HOW 
Multi-tenancy 
Ability of the single application to serve requests from multiple 
customers at the same time. 
When the application is written for the on-premises clients, it 
doesn’t make sense to support multiple organisations. 
When the application is written for the cloud, it doesn’t make 
sense to host each customer separately. 
Customers with a single office use JIRA 8h a day. It can serve 
other customers for remaining 16h. 
Single server can process 500 concurrent users. It can host 10 
small companies. 
The application should be written to run with 0-tenants and 1000- 
tenants.
HOW 
Multi-tenancy is difficult 
We have data of Nike, NASA and Twitter. We can’t leak this data. 
Tenant id is public. 
Encrypted information about the tenant needs to be propagated 
with each request. 
When passing this information, it must be encrypted along 
with a timestamp. 
Tenant id must be unique and strong. 
DON’TS: put the hostname, organisation name or any other 
data to tenant id. This data will change. 
We had an error: 
https://ecosystem.atlassian.net/browse/AC-811 
OpenID provider for all services.
DEPLOYMENT
HOW 
How do I deploy this? 
52 000 servers in multiple data centers. 
Difference in 
- os version (good if the os is the same) 
- hardware 
- database version 
- schema version 
You can’t update everything at the same time: 
- no expected downtime 
- data centers not optimised for 100% energy utilisation 
- data centers not optimised for the heat. 
Services updated independently: 
- each team owns it own deployment schedule 
- each team may maintain couple of versions of services 
- experimental features may be enabled/disabled on some services
HOW 
Fast Five - Quality at speed 
Stage Behaviour Data Code Data 
schema Activation Comment 
1 Old Old Old Deployment Code is running as is. 
2 Old 
New and 
old 
together 
Old Deployment New code deployment. 
3 Old 
New and 
old 
together 
New 
Deployment 
or 
Configuration 
Database migration. 
4 
New and 
old 
together 
New and 
old 
together 
New 
Deployment, 
Configuration 
or Context 
Slowly enable the feature on all 
racks. Features might be enabled 
in various configurations. 
5 New New New Deployment Delete the obsolete code.
HOW 
DEV/DOG/PROD 
Deployment never go to client first. 
First versions are deployed to development environment. 
Development environment is tested with production versions of 
remaining services. 
Good development versions are promoted to dogfood 
environment. This version is used there internally against 
production versions of other services. 
Good dogfooding versions are promoted to production 
environment. Futures are slowly enabled on production. 
Possible issues: 
- New service was not tested against all versions running in 
production. 
- Couple of new services deployed at the same time. They 
were never tested together. Release manager should resolve 
this issue and schedule the feature release.
Thank you!

More Related Content

PDF
API Basics
PDF
Creating an Effective Mobile API
PDF
Designing your API Server for mobile apps
PDF
Rest api design by george reese
PPTX
Best Practices for Architecting a Pragmatic Web API.
PPTX
Data normalization across API interactions
KEY
Web API Basics
PPTX
SPCA2013 - Developing Provider-Hosted Apps for SharePoint 2013
API Basics
Creating an Effective Mobile API
Designing your API Server for mobile apps
Rest api design by george reese
Best Practices for Architecting a Pragmatic Web API.
Data normalization across API interactions
Web API Basics
SPCA2013 - Developing Provider-Hosted Apps for SharePoint 2013

What's hot (20)

PPTX
Designing for SharePoint Provider Hosted Apps
PPTX
RESTful modules in zf2
PPT
Building a non-blocking REST API in less than 30 minutes
PPTX
Developing a Provider Hosted SharePoint app
PPTX
Single page apps_with_cf_and_angular[1]
PDF
Api manager preconference
PDF
Web, Mobile, App and Back!
PDF
Building SharePoint 2013 Apps - Architecture, Authentication & Connectivity API
PPTX
PDF
AEM GEMS Session SAML authentication in AEM
PPTX
[Pinto] Is my SharePoint Development team properly enlighted?
PDF
Learn REST in 18 Slides
PPTX
LAJUG Napster REST API
PPTX
Web Apps atop a Content Repository
PDF
Practical management of development & QA environments for SharePoint 2013
PPTX
RESTful API - Best Practices
PDF
Usable REST APIs. Jrubyconf Edition. Javier Ramirez @ teowaki
PPTX
Rest and Sling Resolution
PDF
What is App Engine? O
PDF
Doing REST Right
Designing for SharePoint Provider Hosted Apps
RESTful modules in zf2
Building a non-blocking REST API in less than 30 minutes
Developing a Provider Hosted SharePoint app
Single page apps_with_cf_and_angular[1]
Api manager preconference
Web, Mobile, App and Back!
Building SharePoint 2013 Apps - Architecture, Authentication & Connectivity API
AEM GEMS Session SAML authentication in AEM
[Pinto] Is my SharePoint Development team properly enlighted?
Learn REST in 18 Slides
LAJUG Napster REST API
Web Apps atop a Content Repository
Practical management of development & QA environments for SharePoint 2013
RESTful API - Best Practices
Usable REST APIs. Jrubyconf Edition. Javier Ramirez @ teowaki
Rest and Sling Resolution
What is App Engine? O
Doing REST Right
Ad

Similar to Business Applications Integration In The Cloud (20)

PDF
Rest ful tools for lazy experts
PDF
RESTFul Tools For Lazy Experts - CFSummit 2016
PDF
PPTX
Http and REST APIs.
PPTX
Cloud Side: REST APIs - Best practices
PDF
GlueCon 2018: Are REST APIs Still Relevant Today?
PPT
APITalkMeetupSharable
PDF
Integration strategies best practices- Mulesoft meetup April 2018
PDF
apidays LIVE Paris - Potential of API integrations, common traps and advices ...
PPTX
Service approach for development Rest API in Symfony2
PDF
Cloud Elements | State of API Integration Report 2018
PPTX
REST Methodologies
PDF
Embracing HTTP in the era of API’s
PDF
Introduction to developing modern web apps
PDF
Создание API, которое полюбят разработчики. Глубокое погружение
PDF
The ultimate api checklist by Blendr.io
PDF
What’s behind a high quality web API? Ensure your APIs are more than just a ...
PPTX
Austin API Summit 2018: Are REST APIs Still Relevant Today?
PPTX
Are REST APIs Still Relevant Today?
PDF
Алексей Веркеенко "Symfony2 & REST API"
Rest ful tools for lazy experts
RESTFul Tools For Lazy Experts - CFSummit 2016
Http and REST APIs.
Cloud Side: REST APIs - Best practices
GlueCon 2018: Are REST APIs Still Relevant Today?
APITalkMeetupSharable
Integration strategies best practices- Mulesoft meetup April 2018
apidays LIVE Paris - Potential of API integrations, common traps and advices ...
Service approach for development Rest API in Symfony2
Cloud Elements | State of API Integration Report 2018
REST Methodologies
Embracing HTTP in the era of API’s
Introduction to developing modern web apps
Создание API, которое полюбят разработчики. Глубокое погружение
The ultimate api checklist by Blendr.io
What’s behind a high quality web API? Ensure your APIs are more than just a ...
Austin API Summit 2018: Are REST APIs Still Relevant Today?
Are REST APIs Still Relevant Today?
Алексей Веркеенко "Symfony2 & REST API"
Ad

Recently uploaded (20)

PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
DOCX
573137875-Attendance-Management-System-original
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PPT on Performance Review to get promotions
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Sustainable Sites - Green Building Construction
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
composite construction of structures.pdf
PPTX
web development for engineering and engineering
Strings in CPP - Strings in C++ are sequences of characters used to store and...
573137875-Attendance-Management-System-original
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT on Performance Review to get promotions
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CH1 Production IntroductoryConcepts.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CYBER-CRIMES AND SECURITY A guide to understanding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
bas. eng. economics group 4 presentation 1.pptx
Structs to JSON How Go Powers REST APIs.pdf
Foundation to blockchain - A guide to Blockchain Tech
Sustainable Sites - Green Building Construction
Embodied AI: Ushering in the Next Era of Intelligent Systems
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
OOP with Java - Java Introduction (Basics)
composite construction of structures.pdf
web development for engineering and engineering

Business Applications Integration In The Cloud

  • 1. Previously worked in Lufthansa, NASA, Intel Running, biking, paragliding Travelling Photography Filip Rogaczewski • frogaczewski@atlassian.com • Spartez/Atlassian ETI graduate Team leader in Spartez
  • 2. BUSINESS APPLICATIONS INTEGRATION IN THE CLOUD: HOW TO INTEGRATE 50 000+ SERVERS TOGETHER
  • 3. WHY CASE STUDIES Agenda HOW UI INTEGRATION OPPORTUNITY REST API MESSAGING MULTI-TENANCY DEPLOYMENT
  • 4. WHY Case study: Facebook Recommended Friends feed Chat Activity Stream Applications Chat
  • 5. Many distinct services integrated into a single application
  • 6. WHY Service Oriented Architecture SOAP (simple object access protocol) XML RPC (remote procedure call) CORBA RMI (remote method invocation)
  • 7. SOA: Loosely coupled & independently working services
  • 8. WHY Service Oriented Architecture Scales the application • Loosely coupled services • Less resource restrictions for services • Communication with well defined API • Allows better technological choice for services • Distinct deployment models Service Service CONTAINER Integration HTTP
  • 9. WHY Service Oriented Architecture Different hardware stack for services in Facebook Type I Web Type III DB Type IV Hadoop Type V Haystack Type VI Cache Type VII Cold storage CPU (2) Xeon E5-2670 (2) Xeon E5-2660 (2) Xeon E5-2660 (2) Xeon E5-2660 (2) Xeon E5-2660 (2) Xeon E5-2660 Memory 16GB 144 GB 64 GB 96 GB 144 GB 144 GB Disk (1) 500 GB SATA 3.2TB PCI Flash (15) 4TB SAS (30) 4TB SAS (1) 2 TB SATA (240) 4TB SATA
  • 10. Problems faced by Facebook today, are our problems in few years
  • 11. WHY Service Oriented Architecture More effective organisation • Each team running a single service. • Each team is cross-functional (designers, product managers, testers, developers, ops-engineers). • Decision about roadmap happen locally. • Geographically collocated teams, one service in USA, second service in Australia, third in Poland. • Easier to scale work, multiple teams working at the same time.
  • 12. What is the alternative?
  • 13. WHY In Process Integration CONTAINER Add-On In Process • Resources are shared • Access to all data • Doesn’t scale Tied to the stack • Language • Frameworks Add-On No clear API boundaries
  • 14. Who else does integration?
  • 15. WHY Spotify Each item is distinct service Music stream Friends feed Browse music service
  • 16. WHY Atlassian: JIRA Bitbucket Attachments Confluence Hipchat JIRA Agile
  • 18. WHY Integrations of multiple applications You can sell all your products instead of one.
  • 19. WHY Extending with marketplace Customers always want more features. If you can’t give it to them, let someone else do this - marketplace. Cash 25% of what external vendors sold using your marketplace.
  • 20. 30 000 000 $/year
  • 21. WHY Enterprise customers Customers who want to integrate your product with their existing applications HR Communi cation Environm ent CRM Asset manageme nt Supply GRC chain Finance
  • 22. WHY Acquisitions You buy next fantastic company. You want to quickly integrate this feature. Can take couple of months if you have an integration layer ready. Might never be done, if you don’t. ???
  • 23. CASE STUDIES HOW Agenda WHY UI INTEGRATION OPPORTUNITY REST API MESSAGING MULTI-TENANCY DEPLOYMENT
  • 25. HOW How to embed external HTML here?
  • 26. HOW Iframe Never embed HTML from external sites. When using iframes, browser provides security: • Don’t set sandboxing to allow-forms, allow-scripts, allow-same- origin, allow-top-navigation. This is a security model very difficult to manage. Sign the URL so server rendering content can authenticate the request. Optionally pass context parameters. Use CORS or postMessage for communication. Performance issues.
  • 28. HOW Security: How to verify this request? https://whoslooking-stg.herokuapp.com/poller?issue_key=ACJIRA-157 &tz=Australia%2FSydney &loc=en-US &user_id=frogaczewski &user_key=frogaczewski &xdm_e=https%3A%2F%2Fecosystem.atlassian.net&xdm_c=channel-whoslooking-connect-stg__ whos-looking&cp=&lic=none &jwt= eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJmcm 9nYWN6ZXdza2kiLCJxc2giOiJiZjA1NmU5MjEzYjBkODIyNDA wNzg4YmQ4MThhNDk4YmM0NGQ0OTMyYTM2MWU1Mjk1Zj cwMTczOGRiMGRjOTA2IiwiaXNzIjoiamlyYTo1OTk3NWQ2Ny 00Y2EwLTRlOWUtOTk2MC1kMWFhYWU3NmJiMzkiLCJleHA iOjE0MTMxMzI2NTksImlhdCI6MTQxMzEzMjQ3OX0.Da8VXjL _9z5xyzErtaJohHKH-xx-0Rp-9MF_xtIvcaY
  • 29. HOW Security: URL signing requirements 1. Signature for validation who created the request. 2. Issuer: identify the application instance which issued the request. Is this jiraForEti or is this jiraForGdanskUniversity? 3. Expiration time of the token. Time in UTC after which you should no longer accept the token. 4. Query hash. Prevents URL tampering. 5. Id of the user for authorisation. 6. Algorithm used to sign the URL.
  • 30. HOW Security: Signature validation 1. Token has the following form: 2. Upon installation host and service exchange a shared secret. 3. Service receives a public key of the host. Service have to verify the public key. Each service expose REST API for public key retrieval. 4. During request service extracts the issuer and signature algorithm from the URL and retrieves the sharedSecret for the issuer. 5. Service signs encodedHeader.encodedClaims with algorithm from the header and verifies if the signatures match. If yes, return content. If no, return 403 (forbidden).
  • 31. IFRAME AND PARENT COMMUNICATION
  • 32. HOW Sandboxing An iframe instance whose parent and child reside on different domains or hostnames constitutes a sandboxed environment. The contained page has no access to its parent. These restrictions are imposed by the browser's same origin policy. There are a few limitations applicable to iframes: • Stylesheet properties from the parent do not cascade to the child page • Child pages have no access to its parent's DOM and JavaScript properties • Likewise, the parent has no access to its child's DOM or JavaScript properties.
  • 33. HOW Cross origin resource sharing (CORS) 1. Keep the list of whitelisted URL with services allowed to access server resources. 2. When executing cross-origin request, the browser header: Origin: http://service.atlassian.net 3. If the service is whitelisted, server should return: Access-Control-Allow-Origin: http://service.atlassian.net DO NOT USE JSONP 4. Multiple headers for: choosing a subset of allowed headers (Access-Control-Allow-Headers) choosing a subset of allowed HTTP methods (Access-Control-Allow-Methods)
  • 34. HOW window.postMessage 1. Create clear JS API between parent and iframe. 2. Parent creates an event listener for a message. window.addEventListener("message", executeXHR, false); 3. Client executes: window.parent.postMessage(“request", JSON.stringify({url: ‘/rest/api/2/dashboard’, success: function() { alert(“1”);}} ) 4. Parent executes the request on behalf of the child and postMessage the results. 5. Difficult to implement. Host should provide a library with abstraction over JS functions it can handle.
  • 36. HOW Performance: Apdex New relic: measuring user satisfaction • In Atlassian • Satisfied 1s • Tolerating 3s • Our Apdex goal is 0.9 • Apdex between 0.85 to 0.93 is considered to be a good score. • For business applications users are more tolerant then for customer applications • Financial services are out of scope.
  • 37. HOW Performance: Latency 1. Latency Within California? Within Europe? Across Atlantic? US to Australia? EMEA to Asia Pacific? 2. Response times of the application is different in various geographical regions. The customer in US will usually have much better performance then the one in Europe. 3. Use CDN for caching of static resource (akamai, cloudfront, edgecast) 4. There are enterprise class solutions reducing latency (Verizon Enterprise Solutions) 30 ms 30 ms 90 ms 210 ms 250 ms
  • 38. HOW Performance: iframe request Page containing an iframe
  • 39. HOW Performance: iframe request Page containing non-iframe embedded content
  • 41. HOW How do I change this data?
  • 42. WHY REST API Representational state transfer. API is Application Programming Interface. For API to make sense, it needs to be stable. Each service needs an API policy. Unless the REST API creates security risk, it can’t change without a previous notice (deprecation period) when services can start using a valid replacement or announce a end of life for a feature. Unfortunately, errors are also API. Bad return codes can’t change for instance. API should be versioned. Don’t change current API, release a new one. “Be liberal with what you accept, be consistent with what you return” Be precise with accepted and returned content-type.
  • 43. WHY GET method rest/api/issue/ should return all issues? NO. Collections should always be paginated. Returning everything is never realistic in large systems. rest/api/issue/ACJIRA-1 should return a details of a particular issue. NOT all of them. Let user define as query parameter fields which should be returned. You are loosing precious CPU cycles and network bandwidth for returning everything. rest/api/issue/ACJIRA-1 should return ETag ETag header in response for GET: “ETag: xyz” Second request with header: ”If-None-Match: xyz” 304 when not modified, OK when changed with new ETag. Or not found.
  • 44. WHY HATEOS rest/api/issue/ACJIRA-1/delete is not a valid GET usage. Use HATEOAS (Hypertext As The Engine Of Application State) { "href": "rest/api/issue/ACJIRA-1", "rel": "self", "method": "GET" }, { "href": "rest/api/issue", "rel": "all-paginated", "method": "GET" }, { "href": "rest/api/issue", "rel": "create", "method": "POST" } { "href": "rest/api/issue/ACJIRA-1", "rel": "update", "method": "PUT" }, { "href": "rest/api/issue/ACJIRA-1", "rel": "delete", "method": "DELETE" }, { "href": "rest/api/issue/ACJIRA-1", "rel": “partial-update", "method": "PATCH" } idempotent idempotent not idempotent idempotent idempotent not idempotent
  • 45. WHY REST API security Prefer the same mechanism as for UI authentication Possible to use BasicAuth, OAuth, but only with SSL/TLS. Always check permissions of the user. Interesting problem to solve? We have a project ACJIRA and user Filip who can’t access the project. What return code shall he get? It should be 404 (not found) 403 (forbidden) reveals that the project exists. Projects are often named after the company name for which the service is provided. Companies may disagree to publicly acknowledge relationship with another company.
  • 46. WHY AaaS (API as a Service) You don’t need to write all APIs yourself. You can integrate with existing APIs. APIs directories/marketplaces where you can buy APIs. Be careful with passing the user data to external services.
  • 48. HOW How do I know about data change? CI server doesn’t execute PUT request /issue/ACJIRA-27 build completed. How would it know who is interested? It publishes information that the build was completed, jira-build-monitor-service registers a listener for this information.
  • 49. HOW Messaging There are many approaches and concepts around messaging. The key differentiator is message delivery guarantee. It is easy to have 90% or 95% message delivery guarantee. Assuring 100% message delivery is almost impossible. It may require complete service rewrite. It is very important to understand the use case to make a decision what is the expected message delivery. Send messages asynchronously. Connections are precious resources for your service. Messages are API as well. They should have a clear contract and deprecation policy. Make them granular. Specify the content type. Be careful with content-length, too long may DOS the receiver. Sign the request.
  • 50. HOW What can go wrong? Server dies during a change. Event sourcing - record each change in a database. If server died, there is no change to message. Each change have a sequence number. Database trigger. Move the message to a queue. What if database server dies? Resend with a possible duplicate flag. Is the order preserved? Who is controlling this? What if the controlling node of publisher dies? Server died after change, before sending the message. What if the message was not delivered? Server died during processing the message? Pull the message again with REST request to publisher. Parametrise the request with last successfully processed message. Use some Queue Service implementation acting as a proxy. Amazon SQS for instance.
  • 51. HOW Eventually consistent It costs a lot of money to provide message guarantee (implement all the steps from previous slide). Most business applications can life without reliable messaging for a while. When running 52 000 servers or more (it will always be more), you need to acknowledge that things are going fail and messages are not going to be delivered. Apply resilient architecture, which polls for data change (event sourcing again) if the messages are not delivered.
  • 53. HOW How do I ensure I display proper data? I want to display information about related pages owned only by this customer. I want to display information only about source code changes made by organisation of my current customer.
  • 54. HOW Multi-tenancy Ability of the single application to serve requests from multiple customers at the same time. When the application is written for the on-premises clients, it doesn’t make sense to support multiple organisations. When the application is written for the cloud, it doesn’t make sense to host each customer separately. Customers with a single office use JIRA 8h a day. It can serve other customers for remaining 16h. Single server can process 500 concurrent users. It can host 10 small companies. The application should be written to run with 0-tenants and 1000- tenants.
  • 55. HOW Multi-tenancy is difficult We have data of Nike, NASA and Twitter. We can’t leak this data. Tenant id is public. Encrypted information about the tenant needs to be propagated with each request. When passing this information, it must be encrypted along with a timestamp. Tenant id must be unique and strong. DON’TS: put the hostname, organisation name or any other data to tenant id. This data will change. We had an error: https://ecosystem.atlassian.net/browse/AC-811 OpenID provider for all services.
  • 57. HOW How do I deploy this? 52 000 servers in multiple data centers. Difference in - os version (good if the os is the same) - hardware - database version - schema version You can’t update everything at the same time: - no expected downtime - data centers not optimised for 100% energy utilisation - data centers not optimised for the heat. Services updated independently: - each team owns it own deployment schedule - each team may maintain couple of versions of services - experimental features may be enabled/disabled on some services
  • 58. HOW Fast Five - Quality at speed Stage Behaviour Data Code Data schema Activation Comment 1 Old Old Old Deployment Code is running as is. 2 Old New and old together Old Deployment New code deployment. 3 Old New and old together New Deployment or Configuration Database migration. 4 New and old together New and old together New Deployment, Configuration or Context Slowly enable the feature on all racks. Features might be enabled in various configurations. 5 New New New Deployment Delete the obsolete code.
  • 59. HOW DEV/DOG/PROD Deployment never go to client first. First versions are deployed to development environment. Development environment is tested with production versions of remaining services. Good development versions are promoted to dogfood environment. This version is used there internally against production versions of other services. Good dogfooding versions are promoted to production environment. Futures are slowly enabled on production. Possible issues: - New service was not tested against all versions running in production. - Couple of new services deployed at the same time. They were never tested together. Release manager should resolve this issue and schedule the feature release.