SlideShare a Scribd company logo
Galaxy
or the escape fromillusion
Michał Zabiełło
A newwayto visualize systemperformance developedbyaPolishcompanyhasbeengaining
recognition.The solutionisalreadyusedbyseveraldozenPolishcompaniesandresolutelycutsthrough
the well-knownweaknessesof APMsolutions.
One of the elementswhichmayimplementrational savingsinITisthe groupof toolsforapplication
performance management(APM).Large corporationsare investinginpurchasesof APMtools.The
providersof suchsolutionsare implementingtensof dashboards,hundredsof graphsandflow
diagrams.Theydefine thousandsof variousalertsandinundate the mailboxesof relevantrecipients
withmessagesaboutthe “healthcheck”of businessprocesses. Thisisdesignedto convince thatthe
scatteredIT infrastructure isundercontrol.Itall worksuntil aseriousmalfunctionoccurs.ITspecialists
try to identifythe cause of the problem, analyze millionsof out-of-date,unnecessaryorerroneous
piecesof informationcoming fromthe implementedtools.
Bombarded by alerts
The toolsto diagnose ormonitorapplicationsare of keyimportance.Goodtoolsare expensive –they
require manylaboratorychecks,tests,anda precise manufacturingprocess.Goodandexpensivetools
are, in turn,complicated.
It isworth notingthatsuch productshave a specificmethodologyconnectedwithperformance
management:we install atool,configurethe scope of reportedmetricsandbuildacomplicated“health
check” applicationtowarnusabout problemsoccurringinthe monitoredapplications.Inpractice,the
systemwarnsus abouta problemthathas occurred – but the cost of using,maintaininganddeveloping
the applicationisoftenhigherthanplanned.
Dashboardshave become,paradoxically,the Achilles’footof those tools –everymonitoredapplication
has to have a setof hierarchical dashboards,andeachbitof informationpresentedonitrequiresaset
of definedSLA perimeterswhichallowtochange the resultof the “healthcheck” – whichis signaledby
colorsgreen,yellow,orred.Thissignalingisnotunequivocal –it isnot clearwhetheritmeansa failure
of the systemorjust a slowdown,whetherthe problemconcernsasingle functionora whole set.
The toolsare bombardingthe administratorswithinformation.The commandcenterhasitshandsfull
withsiftingandseparatingfalse alarmsfromthose responsiblefordisruptionsindataprocessing.The
implementationspecialistsresponsible fortoolsare constantlyworkingonupdatingand adaptingthe
dashboardsto frequentlychangingapplicationsorrequirementsconcerningnotificationsabout
applicationproblems.
The command centerhasits handsfull with
separatingfalse alarms.The implementation
specialistsresponsible fortoolsare constantly
workingonupdatingandadaptingthe
dashboardsto frequentlychangingapplications
or requirementsconcerningnotificationsabout
applicationproblems.Thatishow APM
operates.
In search of an intuitive APM
In 2012 a group of programmersexperiencedinimplementingandadministrationof APMsolutions
formeda company.Itsgoal wasto create a solutionwhichwouldovercomethe weaknessesand
limitationsof monitoringsystemsandincrease the performance of applications.“Ourpointof departure
increatingthe systemwasa fundamental question:Dodatafrom monitoredsystems,alertsandtrends
have to be representedinawaywhichrequireshuge outlays?” –says GrzegorzPawluk,CTOand one of
the co-foundersof FlopsarTechnology.
Perhapsitis possible toshow ina
simple, intuitive mannerwhatis
the most importantforIT services:
 that a malfunctionhasjustoccurred;
 that the usersmay complainaboutthe systemworkinginefficiently;
 that the providerimplementedabadlywritten applicationwhichcannotfunctioninan
overloadedenvironment;
 that the applicationisusinguptoomuch of the powerof the expensive equipment.
Those commonsensical assumptionsare behindFlopsar(FlopSearchandRescue).The creatorsof
FlopsarSuite askedthemselvesone more question“Whatisreallyimportantinthe tangle of
informationreportedfromthe monitoredsystem?”Andtheyformulatedthe followinganswers:
1. Simple implementationandnoneedforanadvancedconfiguration:Plug-and-play.
2. No need totrainpeople whobenefitfromthe tool.
3. SIMPLE, intuitiveinterface (preferablyone window).
4. Maximumproductivity - todiscoveraproblemandto finditscause,the usershouldnotneedto
performmore thanthree operations.
5. No “earlywarningsystems”basedonlabor-intensive development.
Flopsar Galaxy
Innovation can be seen in the approach to
the project. The Flopsar project started with
designing the infrastructure: messages,
protocols, engines, data structure,
mechanisms for load-balancing and
bypassing the malfunction. The entire
infrastructure was programmed in C
language.
Flopsardoesnotaggregate data. It doesnoesnot
showaverages,mediansorquartiles.With
unstable systemsthe sampleistoolarge and
therefore notcredible.The galaxyshowsEVERY
single operationperformedwithinthe monitored
system.Each time atransferwas performedor
someone loggedinto anapplication,adotwould
appear,locatedwithinthe timescale of the event
(axisX) andthe response timescale (axisY).The
majorityof “correct” times(the oneswith
sufficientprocessingquality) isconcentrated
withinthe lowerregistersof the galaxy.The dots
forma multicoloredplane there.If anapplication
or its functionhassloweddownor
malfunctioned,the dotsmigrate intothe upper
registersof the galaxyandformvarious
concentrationpatterns.The factthat those
concentrationsappearinthe galaxyisthe reason
for furtherinvestigation.The concentrationsare
automaticallydetectedbyasystembasedon
artificial intelligence algorithmsormaybe
markedmanuallyinordertoidentifythe reason
for theiroccurrence.Aftermarking,the user
receivesaprecise diagnosisof whatand whyis
not workingcorrectlyinthe system.
Afterseveral daysof workingwiththe Flopsar
systemadministratorsbegintofeel thatthey
knowwhattheysee.Basedoneventsobservedin
the past and interpretedconcentrationstheymay
say “the queue systemgotdisconnectedagain,”
Flopsar in UFG:
productionmonitoring of critical
applications
 Reduction of production problems
related to application performance
 Code optimization – shorter
response times
 Reduced use of hardware
infrastructure
How quickly does conclusion-making learn
based on Flopsar visualization?
“We collect millions of data on policies,
drivers and road events. It is critical to
ensure the reliability and quality of
operation of the IT systems which perform
our statutory tasks. We selected the Flopsar
Suite because of its intuitiveness and
functionality. The tool was implemented
within a few hours and its effective
operation by the team of administrators
started immediately after the
implementation. The factors in favor of
choosing Flopsar included also costs, the
level of after-sales service, flexibility and the
range of additional solution services offered
by the provider. The data used from
monitoring indicate unequivocally where
the problem has occurred and, therefore,
who is responsible for its servicing or repair.
Today, we use the information obtained
from Flopsar software in many cases as an
argument in our negotiations with our IT
service providers” – says Grzegorz
Rymarski, IT Department Director, The
Insurance Guarantee Fund (UFG).
or “webservice isnotworkingagain”or evenignore the patternassomethingnatural.
The systemworkswithoutconfiguration –there isno needtoconstruct dashboards,todefine staticSLA
for selectedmethods,toprovideexpensivesystemmaintenance.Once the monitoringsystemhasbeen
switchedon,the applicationserverprocessesdata,the monitorstartsshowingconcentrationsandthe
administratorstartslookingforunnatural anddisturbedconcentrationpatterns.
Innovation through goingback to the roots
Is the “galactic” wayof showingdatainnovativeandunique?Scatter-plotisusedinstatisticstovisualize
data. GrzegorzPawlukexplains:“Flopsarreportseverytransactionperformedinthe monitoredsystem
separately.Itconnectsstackframesintostacktraces and thenreportsthe aggregateddurationof the
transactionas one point(withfull accesstoall the remainingdata).Inthistype of service,the volume of
data whichneedstobe recordedinthe monitoringbase isgigantic.Therefore,itisthe database
infrastructure (datapersistence)andnotdata-generatingagentwhichisthe ‘heart’of the Flopsar
system.”
Innovation –or perhapsratherthe returnto healthyroots – can be seeninthe approachto the project.
The Flopsar project started with designing the infrastructure: messages, protocols, engines,
data structure, mechanisms for load-balancing and bypassing the malfunction. The entire
infrastructure was programmed in C language – the most efficient programming language. The
code which has 5,000,000 lines was written from scratch and entirely without using any
external (e.g. OpenSource) libraries. The engineers and Flopsar support are responsible for
100% of the solution. Tests and production implementation prove that Flopsar can process
around 40,000 metrics per second or a cumulated load at the level of 200 MB/sec for a single
data base instance in the 24/7/365 mode.
In 2013 Flopsar Technology implemented its solution as the only APM software provider on
approximately 100 production application servers in the Polish market and in cooperation with
strategic business partners it carried out several dozen projects to optimize critical systems.
During the same period of time, the competitors have record a few individual license sales in
Poland. At this time, the company, together with a number of partners is running a few Proof of
Concept projects. “We estimate that until the end of 2014 the number of implementations will
exceed 300 monitored application servers in mission critical-type systems. This will make
Flopsar Technology an unrivalled market leader in the field of monitoring and managing the
performance of critical applications based on Java servers” – says Grzegorz Pawluk. In the boxes
you can see examples of using Flopsar at UFG and Generali – together with their top IT
managers’ comments.
CIOMagazineasked MichałZaremba,IT Infrastructure Project Manager,IT Department Support and
Infrastructure Section,Generali Group,to commenton detailed changesrelated to the Generali Group
APMsolution implementation.
The Generali Group:
Salesmanagement systemproduction monitoring
 Complete detectionof all productionissues(failures,delays,defects)
 Full control overIT systemproductionversionacceptance –earlyissue detection,application
code optimizationsuggestions,architecture andperformanceissue consulting
 Code refactoring– processingoptimization(performanceincrease)
 Capacityrequirementestimationforincreaseddataprocessingperiods
Flopsar Suite – Whoshouldmanage quality and efficiency?
Until recentlyFlopsarSuite wasutilizedbythe Generali Grouponlyforearlydetectionof performance
issuesinproductionsystems. Itwashandledbythe teamresponsible forITsystemandservice
monitoring.Duringperformance testingdeveloperswere usingittodiscoverinefficientmethodsand
queries. Furtherexperienceswiththe FlopsarSuite helpeddevelopadifferent,more effective
applicationperformance monitoringmodel.
If you take a closerlookat the tool,itisdifficulttodecide,whetherthisisanadvancedapplication
serverperformance monitoringsystem,orareportingsystemdesignedforanalyzingITsystem
operationperformance. Inthe firstcase Flopsarmaybe perceivedasjustanothermonitoringsystem
utilizedinmaintenance activities,andinthe secondcase,as an additional systemforsupporting
applicationdevelopmentandservice transitionfromthe developmenttothe maintenance stage. -
However,one mustrealize,thatinorderto provide ourcustomerswithtopvalue andperformance,a
verydeepsynergyof these areasisrequired.Thisalsoopensupextensive processoptimization
capabilitiesbyeliminatingunnecessaryITresource consumers, whichprovide novalue toservice
recipients.
Departmentstructure transformationandtransitiontoa dev-opsconceptenabledFlopsarSuite to
finallyendupina spot,where itsfull capabilitiesmaybe utilized –inthe handsof a team responsiblefor
IT applicationsandservices –boththeirdevelopmentandoperationalactivities. The importantfactis
that systemutilizationinbothareasisverysimilar,andthereforerequiresnochangesinteamwork
style ormode,or any additional training.
Theoretical conclusionsand diagnosisare supposedlydeliveredbyFlopsarvery quickly.How quickly,
and have you beensuccessful intransforming them intoIT processand product optimization?
The use of Flopsarenablesustogreatlyimprove the speedof handlingincidentsinaproduction
environment. The time betweenananomalyappearinginaproductionsystem, andcorrective actions
beinglaunchedbythe team,isnearlynull.Inthe past,if an end-userhadasubjective feeling,thatthe
systemisnotperformingwell,suchinformationhadtopass throughmultiple ITorganizationlevels. Now
thisinformationisvisibletoan expertpreciselywhenthe userbeginstofeelthe systembecomingless
responsive. All inall,the userreportsproblemstothe service desklike before,butthe service desk
alreadyknowsaboutfaultysystemoperations,andaboutaninterventionbeingunderway.Thisgreatly
cuts downon the time requiredtoresolve incidents,due tobeingable tofindthe problem-causing
method,service,orqueryinaquickand intuitive fashion.
Applicationdevelopmentandtestprocesseshave alsobeenoptimized.Thankstomonitoring
applicationsindevelopmentandtestenvironments,we are able todiscoveroperationswithexecution
time beyondacceptable limits.
By analyzingthe numberof particularcallsina givenperiodof time we are able todefine business
activitypatterns,andas a result,properlymanage ITservice capacity,performance,anddemands. This
alsoenablesustoproperlyschedule change managementprocesses,includingplannedmaintenance
outages.
Based on those patterns and querystatistics, is it possible tooptimize otherorganizational processes
and activities?Can the solutionbecome a source of other innovations?
If the businessprocessisperformedinanITsystem, whichiscoveredbyFlopsaranalysis,all system
operationsare registered,andmaybe analyzed. Specificdatavisualizationenablesustoestablish
businessprocessactivitieswhichare performedinefficiently.
Usuallya businessprocessperformedinanITsystemistreatedbya businessuserasanoperationwitha
definitestartandend. In reality,thisprocessincludesmultiple operationswhichreachbeyondthe
application,towardsthe integrationarchitecture,the database,andothersystems. AdvancedBPM
systemsfeature aBusinessActivityMonitoring(BAM) component,whichmaybe utilizedtooptimize
businessprocesses.However,if applicationsare developedin-house,abusinessprocessmonitoringtool
shouldalsobe provided,whichissupportedbyparticularapplications. If the ownerdecidesnotto
implementsuchfunctionalityinthe developedapplication,database-baseddeductionmaybe helpful,
whichmay be providedbythe Flopsarsystem.
Has capacity demand forecast accuracy improved? Has this lead to optimizinginfrastructure usage?
In termsof infrastructure optimizationforapplicationperformance Generalireliesonthree base
techniques:monitoringtechnical parametersof infrastructure components(usingSNMP,WMI,etc.),
optimizingloadbalancing,andapplicationperformance monitoringusingthe FlopsarSuite.
The firstand secondtechnique are knownandusedbymanyorganizations,butonlyananalysisof
correlationsbetween all of the above providesacomplete imageforcapacityforecasting. Thismaybe
done bytranslatingtechnical parametersof infrastructurecomponentstothe executiontime of an
operationina monitoredapplication.
The character of recentGenerali marketingactivitiesrequiredatemporarymulti-foldcapacityincrease
inMerkury 2.0 – the primarysalessystemutilizedbyGenerali. Atfirst,we consideredlinearserver
infrastructure componentscaling.Whentestingthe solutionwithFlopsar,itturnedout,thatthere are
multiple factors,whichmaygreatlyinfluence performance,andmaybe modifiedinordertoincrease
systemcapacity. We noticedthatstandard loadbalancingtechniquesmayhave anadverse effectonthe
time requiredtoperformoperationsbyasingle user. Loadbalancingconditioningbasedon
infrastructure andsystemparametersenabledustoprovide asolution,whichfeaturedthe same
efficiencyforeveryuser. Curiously,the testshave shown,thatFlopsarSuite impactonenvironmentload
fallsbelow1–2%.Finally,aftercompletingseveraloptimizations,we have reachedastate,where the
systemloadincrease couldbe handledwithoutmodifyingthe serverinfrastructureatall. After
completingthismarketingactivitywe wereable toreduce thatinfrastructure.
How did the transitionto the new methodof observingsalesefficiencygo,especiallyincase of
interpretingeventdistributionvisualizations?Didthe users easilyreach a new deductionprocess?
FlopsarSuite isan intuitivepackage.The systemiscurrentlyusedbythe IT department,butwe are
seriouslyconsideringsharingitsdatawithbusinessusers,whomightthenuse ittooptimize business
processes.
However,youhave toconsiderthe fact,that businessusersoftenrequire numericaldata,notgraphical
presentations,inordertoperformdataanalysis.If Flopsarwasto be usedfor salesefficiencyanalysis,it
wouldbe good,if ithad an optionto provide resultsinanumerical format.Forexample:Departments
responsible forsalescare notonlyabouthow the systemperformance influencesproductsales,butalso
whatthe productsearch operationdistributionisduringparticularhours,withingivenmonthsorwithin
the year.
The fact, that Generali reachedsuchanadvancedlevel of tool use proves,thatthe systemiseasyto
handle. We alsonoticed,thatthe tool may be usedinan evenmore optimizedfashion,if additional
expertiseisgainedpertainingtoitsoperation:analysis,resultinterpretation,aswell asbuildingreport
extensions.Itisworthmentioning,thatall the datacollectedinthe Flopsardatabase are available toour
developersthroughadedicatedAPI.
Are processand factor complexityconsideredlimitationsforthe applicationperformance
visualizationmethodproposedby Flopsar? If so, how can this be circumvented?
Most probablyeveryone,whowaseverresponsible forITsystemperformance optimization,faced
uncertainty,whetherthe systemoperatesthe same waybetweenmeasurements,asduring
measurements. Thisistypical forsystems,whereperformance ismeasuredatestablishedtimeperiods.
Flopsaranalyzeseveryoperationwithinthe system.If we donotfilterparticularcallsina so-called-
galaxy,everypointrepresentsone systemcall.If the processesperformedare of highcomplexity,we
are forcedto operate ona large numberof geometricallycorrelatedpoints. Insuchcase data analysis
requiresverifyingparticularcallsamongstalargernumberof those measuredandpresented. Thismight
become a limitationdue tothe speedof dataanalysisbyan expert. Itmayalso adverselyimpactthe
applicationserverloaddue toFlopsarcollectingdata. Thiscan be circumvented,if we utilizetechniques
to exclude particularcalls, whichare outside ourinterest. Itispossible toachieve atthe system
administrationlevel,whichenablesmonitoringtobe developedindividuallyforeveryapplication. -
Anothermethodtoreduce the data,whichdo notrequire analysis,isanoptionto filteroutminimum
and maximumoperationtimeinthe analyzedsystem. Finally,incase of systemsworkingonseveral
applicationservers,we are able tochange the pointcolorsdependingonthe server. Ibelieve,thatit
wouldbe useful,if there wasan optiontodefine itemcolorsinacustomfashion,e.g.basedonthe type
of systemoperationoronthe executiontime.
CIO Interview about Flopsar APM - Application Performance Management

More Related Content

PDF
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
PDF
Cast Application Intelligence Platform
PDF
OSMC 2008 | Application Transaction Monitoring using Nagios by Satish Jonnavi...
PDF
Point-to-Point vs. MEAP - The Right Approach for an Integrated Mobility Solut...
PDF
Veracode Corporate Overview - Print
PDF
1E AppClarity DataSheet
PPTX
Innovate2010 jazz keynote
PDF
Lawyers and Licenses in Open Source-based Development: How to Protect Your So...
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
Cast Application Intelligence Platform
OSMC 2008 | Application Transaction Monitoring using Nagios by Satish Jonnavi...
Point-to-Point vs. MEAP - The Right Approach for an Integrated Mobility Solut...
Veracode Corporate Overview - Print
1E AppClarity DataSheet
Innovate2010 jazz keynote
Lawyers and Licenses in Open Source-based Development: How to Protect Your So...

What's hot (6)

PDF
Towards new shores with cross-system SoD analyses. [Webinar]
PDF
Tools & Techniques for Addressing Component Vulnerabilities for PCI Compliance
PDF
Take Control of Application Performance
PDF
Five Steps to Better Application Performance
DOCX
Soma_5+_Monitoring_Tools
PDF
Experiences in Mainframe-to-Splunk Big Data Access
Towards new shores with cross-system SoD analyses. [Webinar]
Tools & Techniques for Addressing Component Vulnerabilities for PCI Compliance
Take Control of Application Performance
Five Steps to Better Application Performance
Soma_5+_Monitoring_Tools
Experiences in Mainframe-to-Splunk Big Data Access
Ad

Similar to CIO Interview about Flopsar APM - Application Performance Management (20)

PPSX
Flopsar-UK (3)
PDF
Flopsar tesacom-technical-introduction v1a-eng
PDF
Flopsar light-galaxy eng-nl
PDF
Cases studies
PDF
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
PPTX
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
PDF
Analytics Driven SIEM Workshop
PDF
Next generation alerting and fault detection, SRECon Europe 2016
PDF
Knowledge is Power: Visualizing JIRA's Performance Data
PPT
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
DOCX
The Trouble With Enterprise SoftwareF A L L 2 0 0 7 .docx
PDF
Monitoring Complex Systems - Chicago Erlang, 2014
PPTX
Observability – the good, the bad, and the ugly
PPTX
Observability - the good, the bad, and the ugly
PDF
Game-Changing Demands on Network Management Require a Secure, Integrated, and...
PPTX
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
PPTX
Multi-Platform Application Monitoring
PDF
Using Time Series for Full Observability of a SaaS Platform
PDF
How To Build Mature SM - final
PDF
Zabbix Smart problem detection - FISL 2015 workshop
Flopsar-UK (3)
Flopsar tesacom-technical-introduction v1a-eng
Flopsar light-galaxy eng-nl
Cases studies
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Analytics Driven SIEM Workshop
Next generation alerting and fault detection, SRECon Europe 2016
Knowledge is Power: Visualizing JIRA's Performance Data
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
The Trouble With Enterprise SoftwareF A L L 2 0 0 7 .docx
Monitoring Complex Systems - Chicago Erlang, 2014
Observability – the good, the bad, and the ugly
Observability - the good, the bad, and the ugly
Game-Changing Demands on Network Management Require a Secure, Integrated, and...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Multi-Platform Application Monitoring
Using Time Series for Full Observability of a SaaS Platform
How To Build Mature SM - final
Zabbix Smart problem detection - FISL 2015 workshop
Ad

Recently uploaded (20)

PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Microsoft Core Cloud Services powerpoint
PDF
Lecture1 pattern recognition............
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Managing Community Partner Relationships
PPTX
modul_python (1).pptx for professional and student
PDF
[EN] Industrial Machine Downtime Prediction
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Introduction to the R Programming Language
PPTX
A Complete Guide to Streamlining Business Processes
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Qualitative Qantitative and Mixed Methods.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Leprosy and NLEP programme community medicine
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Microsoft Core Cloud Services powerpoint
Lecture1 pattern recognition............
DATA COLLECTION METHODS-ppt for nursing research
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Managing Community Partner Relationships
modul_python (1).pptx for professional and student
[EN] Industrial Machine Downtime Prediction
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Introduction to the R Programming Language
A Complete Guide to Streamlining Business Processes

CIO Interview about Flopsar APM - Application Performance Management

  • 1. Galaxy or the escape fromillusion Michał Zabiełło A newwayto visualize systemperformance developedbyaPolishcompanyhasbeengaining recognition.The solutionisalreadyusedbyseveraldozenPolishcompaniesandresolutelycutsthrough the well-knownweaknessesof APMsolutions. One of the elementswhichmayimplementrational savingsinITisthe groupof toolsforapplication performance management(APM).Large corporationsare investinginpurchasesof APMtools.The providersof suchsolutionsare implementingtensof dashboards,hundredsof graphsandflow diagrams.Theydefine thousandsof variousalertsandinundate the mailboxesof relevantrecipients withmessagesaboutthe “healthcheck”of businessprocesses. Thisisdesignedto convince thatthe scatteredIT infrastructure isundercontrol.Itall worksuntil aseriousmalfunctionoccurs.ITspecialists try to identifythe cause of the problem, analyze millionsof out-of-date,unnecessaryorerroneous piecesof informationcoming fromthe implementedtools. Bombarded by alerts The toolsto diagnose ormonitorapplicationsare of keyimportance.Goodtoolsare expensive –they require manylaboratorychecks,tests,anda precise manufacturingprocess.Goodandexpensivetools are, in turn,complicated. It isworth notingthatsuch productshave a specificmethodologyconnectedwithperformance management:we install atool,configurethe scope of reportedmetricsandbuildacomplicated“health check” applicationtowarnusabout problemsoccurringinthe monitoredapplications.Inpractice,the systemwarnsus abouta problemthathas occurred – but the cost of using,maintaininganddeveloping the applicationisoftenhigherthanplanned. Dashboardshave become,paradoxically,the Achilles’footof those tools –everymonitoredapplication has to have a setof hierarchical dashboards,andeachbitof informationpresentedonitrequiresaset of definedSLA perimeterswhichallowtochange the resultof the “healthcheck” – whichis signaledby colorsgreen,yellow,orred.Thissignalingisnotunequivocal –it isnot clearwhetheritmeansa failure of the systemorjust a slowdown,whetherthe problemconcernsasingle functionora whole set. The toolsare bombardingthe administratorswithinformation.The commandcenterhasitshandsfull withsiftingandseparatingfalse alarmsfromthose responsiblefordisruptionsindataprocessing.The implementationspecialistsresponsible fortoolsare constantlyworkingonupdatingand adaptingthe dashboardsto frequentlychangingapplicationsorrequirementsconcerningnotificationsabout applicationproblems.
  • 2. The command centerhasits handsfull with separatingfalse alarms.The implementation specialistsresponsible fortoolsare constantly workingonupdatingandadaptingthe dashboardsto frequentlychangingapplications or requirementsconcerningnotificationsabout applicationproblems.Thatishow APM operates. In search of an intuitive APM In 2012 a group of programmersexperiencedinimplementingandadministrationof APMsolutions formeda company.Itsgoal wasto create a solutionwhichwouldovercomethe weaknessesand limitationsof monitoringsystemsandincrease the performance of applications.“Ourpointof departure increatingthe systemwasa fundamental question:Dodatafrom monitoredsystems,alertsandtrends have to be representedinawaywhichrequireshuge outlays?” –says GrzegorzPawluk,CTOand one of the co-foundersof FlopsarTechnology. Perhapsitis possible toshow ina simple, intuitive mannerwhatis the most importantforIT services:  that a malfunctionhasjustoccurred;  that the usersmay complainaboutthe systemworkinginefficiently;  that the providerimplementedabadlywritten applicationwhichcannotfunctioninan overloadedenvironment;  that the applicationisusinguptoomuch of the powerof the expensive equipment. Those commonsensical assumptionsare behindFlopsar(FlopSearchandRescue).The creatorsof FlopsarSuite askedthemselvesone more question“Whatisreallyimportantinthe tangle of informationreportedfromthe monitoredsystem?”Andtheyformulatedthe followinganswers: 1. Simple implementationandnoneedforanadvancedconfiguration:Plug-and-play. 2. No need totrainpeople whobenefitfromthe tool. 3. SIMPLE, intuitiveinterface (preferablyone window). 4. Maximumproductivity - todiscoveraproblemandto finditscause,the usershouldnotneedto performmore thanthree operations. 5. No “earlywarningsystems”basedonlabor-intensive development. Flopsar Galaxy
  • 3. Innovation can be seen in the approach to the project. The Flopsar project started with designing the infrastructure: messages, protocols, engines, data structure, mechanisms for load-balancing and bypassing the malfunction. The entire infrastructure was programmed in C language. Flopsardoesnotaggregate data. It doesnoesnot showaverages,mediansorquartiles.With unstable systemsthe sampleistoolarge and therefore notcredible.The galaxyshowsEVERY single operationperformedwithinthe monitored system.Each time atransferwas performedor someone loggedinto anapplication,adotwould appear,locatedwithinthe timescale of the event (axisX) andthe response timescale (axisY).The majorityof “correct” times(the oneswith sufficientprocessingquality) isconcentrated withinthe lowerregistersof the galaxy.The dots forma multicoloredplane there.If anapplication or its functionhassloweddownor malfunctioned,the dotsmigrate intothe upper registersof the galaxyandformvarious concentrationpatterns.The factthat those concentrationsappearinthe galaxyisthe reason for furtherinvestigation.The concentrationsare automaticallydetectedbyasystembasedon artificial intelligence algorithmsormaybe markedmanuallyinordertoidentifythe reason for theiroccurrence.Aftermarking,the user receivesaprecise diagnosisof whatand whyis not workingcorrectlyinthe system. Afterseveral daysof workingwiththe Flopsar systemadministratorsbegintofeel thatthey knowwhattheysee.Basedoneventsobservedin the past and interpretedconcentrationstheymay say “the queue systemgotdisconnectedagain,” Flopsar in UFG: productionmonitoring of critical applications  Reduction of production problems related to application performance  Code optimization – shorter response times  Reduced use of hardware infrastructure How quickly does conclusion-making learn based on Flopsar visualization? “We collect millions of data on policies, drivers and road events. It is critical to ensure the reliability and quality of operation of the IT systems which perform our statutory tasks. We selected the Flopsar Suite because of its intuitiveness and functionality. The tool was implemented within a few hours and its effective operation by the team of administrators started immediately after the implementation. The factors in favor of choosing Flopsar included also costs, the level of after-sales service, flexibility and the range of additional solution services offered by the provider. The data used from monitoring indicate unequivocally where the problem has occurred and, therefore, who is responsible for its servicing or repair. Today, we use the information obtained from Flopsar software in many cases as an argument in our negotiations with our IT service providers” – says Grzegorz Rymarski, IT Department Director, The Insurance Guarantee Fund (UFG).
  • 4. or “webservice isnotworkingagain”or evenignore the patternassomethingnatural. The systemworkswithoutconfiguration –there isno needtoconstruct dashboards,todefine staticSLA for selectedmethods,toprovideexpensivesystemmaintenance.Once the monitoringsystemhasbeen switchedon,the applicationserverprocessesdata,the monitorstartsshowingconcentrationsandthe administratorstartslookingforunnatural anddisturbedconcentrationpatterns. Innovation through goingback to the roots Is the “galactic” wayof showingdatainnovativeandunique?Scatter-plotisusedinstatisticstovisualize data. GrzegorzPawlukexplains:“Flopsarreportseverytransactionperformedinthe monitoredsystem separately.Itconnectsstackframesintostacktraces and thenreportsthe aggregateddurationof the transactionas one point(withfull accesstoall the remainingdata).Inthistype of service,the volume of data whichneedstobe recordedinthe monitoringbase isgigantic.Therefore,itisthe database infrastructure (datapersistence)andnotdata-generatingagentwhichisthe ‘heart’of the Flopsar system.” Innovation –or perhapsratherthe returnto healthyroots – can be seeninthe approachto the project. The Flopsar project started with designing the infrastructure: messages, protocols, engines, data structure, mechanisms for load-balancing and bypassing the malfunction. The entire infrastructure was programmed in C language – the most efficient programming language. The code which has 5,000,000 lines was written from scratch and entirely without using any external (e.g. OpenSource) libraries. The engineers and Flopsar support are responsible for 100% of the solution. Tests and production implementation prove that Flopsar can process around 40,000 metrics per second or a cumulated load at the level of 200 MB/sec for a single data base instance in the 24/7/365 mode. In 2013 Flopsar Technology implemented its solution as the only APM software provider on approximately 100 production application servers in the Polish market and in cooperation with strategic business partners it carried out several dozen projects to optimize critical systems. During the same period of time, the competitors have record a few individual license sales in Poland. At this time, the company, together with a number of partners is running a few Proof of Concept projects. “We estimate that until the end of 2014 the number of implementations will exceed 300 monitored application servers in mission critical-type systems. This will make Flopsar Technology an unrivalled market leader in the field of monitoring and managing the performance of critical applications based on Java servers” – says Grzegorz Pawluk. In the boxes you can see examples of using Flopsar at UFG and Generali – together with their top IT managers’ comments.
  • 5. CIOMagazineasked MichałZaremba,IT Infrastructure Project Manager,IT Department Support and Infrastructure Section,Generali Group,to commenton detailed changesrelated to the Generali Group APMsolution implementation. The Generali Group: Salesmanagement systemproduction monitoring  Complete detectionof all productionissues(failures,delays,defects)  Full control overIT systemproductionversionacceptance –earlyissue detection,application code optimizationsuggestions,architecture andperformanceissue consulting  Code refactoring– processingoptimization(performanceincrease)  Capacityrequirementestimationforincreaseddataprocessingperiods Flopsar Suite – Whoshouldmanage quality and efficiency? Until recentlyFlopsarSuite wasutilizedbythe Generali Grouponlyforearlydetectionof performance issuesinproductionsystems. Itwashandledbythe teamresponsible forITsystemandservice monitoring.Duringperformance testingdeveloperswere usingittodiscoverinefficientmethodsand queries. Furtherexperienceswiththe FlopsarSuite helpeddevelopadifferent,more effective applicationperformance monitoringmodel. If you take a closerlookat the tool,itisdifficulttodecide,whetherthisisanadvancedapplication serverperformance monitoringsystem,orareportingsystemdesignedforanalyzingITsystem operationperformance. Inthe firstcase Flopsarmaybe perceivedasjustanothermonitoringsystem utilizedinmaintenance activities,andinthe secondcase,as an additional systemforsupporting applicationdevelopmentandservice transitionfromthe developmenttothe maintenance stage. - However,one mustrealize,thatinorderto provide ourcustomerswithtopvalue andperformance,a verydeepsynergyof these areasisrequired.Thisalsoopensupextensive processoptimization capabilitiesbyeliminatingunnecessaryITresource consumers, whichprovide novalue toservice recipients. Departmentstructure transformationandtransitiontoa dev-opsconceptenabledFlopsarSuite to finallyendupina spot,where itsfull capabilitiesmaybe utilized –inthe handsof a team responsiblefor IT applicationsandservices –boththeirdevelopmentandoperationalactivities. The importantfactis that systemutilizationinbothareasisverysimilar,andthereforerequiresnochangesinteamwork style ormode,or any additional training. Theoretical conclusionsand diagnosisare supposedlydeliveredbyFlopsarvery quickly.How quickly, and have you beensuccessful intransforming them intoIT processand product optimization? The use of Flopsarenablesustogreatlyimprove the speedof handlingincidentsinaproduction environment. The time betweenananomalyappearinginaproductionsystem, andcorrective actions beinglaunchedbythe team,isnearlynull.Inthe past,if an end-userhadasubjective feeling,thatthe systemisnotperformingwell,suchinformationhadtopass throughmultiple ITorganizationlevels. Now thisinformationisvisibletoan expertpreciselywhenthe userbeginstofeelthe systembecomingless
  • 6. responsive. All inall,the userreportsproblemstothe service desklike before,butthe service desk alreadyknowsaboutfaultysystemoperations,andaboutaninterventionbeingunderway.Thisgreatly cuts downon the time requiredtoresolve incidents,due tobeingable tofindthe problem-causing method,service,orqueryinaquickand intuitive fashion. Applicationdevelopmentandtestprocesseshave alsobeenoptimized.Thankstomonitoring applicationsindevelopmentandtestenvironments,we are able todiscoveroperationswithexecution time beyondacceptable limits. By analyzingthe numberof particularcallsina givenperiodof time we are able todefine business activitypatterns,andas a result,properlymanage ITservice capacity,performance,anddemands. This alsoenablesustoproperlyschedule change managementprocesses,includingplannedmaintenance outages. Based on those patterns and querystatistics, is it possible tooptimize otherorganizational processes and activities?Can the solutionbecome a source of other innovations? If the businessprocessisperformedinanITsystem, whichiscoveredbyFlopsaranalysis,all system operationsare registered,andmaybe analyzed. Specificdatavisualizationenablesustoestablish businessprocessactivitieswhichare performedinefficiently. Usuallya businessprocessperformedinanITsystemistreatedbya businessuserasanoperationwitha definitestartandend. In reality,thisprocessincludesmultiple operationswhichreachbeyondthe application,towardsthe integrationarchitecture,the database,andothersystems. AdvancedBPM systemsfeature aBusinessActivityMonitoring(BAM) component,whichmaybe utilizedtooptimize businessprocesses.However,if applicationsare developedin-house,abusinessprocessmonitoringtool shouldalsobe provided,whichissupportedbyparticularapplications. If the ownerdecidesnotto implementsuchfunctionalityinthe developedapplication,database-baseddeductionmaybe helpful, whichmay be providedbythe Flopsarsystem. Has capacity demand forecast accuracy improved? Has this lead to optimizinginfrastructure usage? In termsof infrastructure optimizationforapplicationperformance Generalireliesonthree base techniques:monitoringtechnical parametersof infrastructure components(usingSNMP,WMI,etc.), optimizingloadbalancing,andapplicationperformance monitoringusingthe FlopsarSuite. The firstand secondtechnique are knownandusedbymanyorganizations,butonlyananalysisof correlationsbetween all of the above providesacomplete imageforcapacityforecasting. Thismaybe done bytranslatingtechnical parametersof infrastructurecomponentstothe executiontime of an operationina monitoredapplication. The character of recentGenerali marketingactivitiesrequiredatemporarymulti-foldcapacityincrease inMerkury 2.0 – the primarysalessystemutilizedbyGenerali. Atfirst,we consideredlinearserver infrastructure componentscaling.Whentestingthe solutionwithFlopsar,itturnedout,thatthere are multiple factors,whichmaygreatlyinfluence performance,andmaybe modifiedinordertoincrease systemcapacity. We noticedthatstandard loadbalancingtechniquesmayhave anadverse effectonthe time requiredtoperformoperationsbyasingle user. Loadbalancingconditioningbasedon infrastructure andsystemparametersenabledustoprovide asolution,whichfeaturedthe same
  • 7. efficiencyforeveryuser. Curiously,the testshave shown,thatFlopsarSuite impactonenvironmentload fallsbelow1–2%.Finally,aftercompletingseveraloptimizations,we have reachedastate,where the systemloadincrease couldbe handledwithoutmodifyingthe serverinfrastructureatall. After completingthismarketingactivitywe wereable toreduce thatinfrastructure. How did the transitionto the new methodof observingsalesefficiencygo,especiallyincase of interpretingeventdistributionvisualizations?Didthe users easilyreach a new deductionprocess? FlopsarSuite isan intuitivepackage.The systemiscurrentlyusedbythe IT department,butwe are seriouslyconsideringsharingitsdatawithbusinessusers,whomightthenuse ittooptimize business processes. However,youhave toconsiderthe fact,that businessusersoftenrequire numericaldata,notgraphical presentations,inordertoperformdataanalysis.If Flopsarwasto be usedfor salesefficiencyanalysis,it wouldbe good,if ithad an optionto provide resultsinanumerical format.Forexample:Departments responsible forsalescare notonlyabouthow the systemperformance influencesproductsales,butalso whatthe productsearch operationdistributionisduringparticularhours,withingivenmonthsorwithin the year. The fact, that Generali reachedsuchanadvancedlevel of tool use proves,thatthe systemiseasyto handle. We alsonoticed,thatthe tool may be usedinan evenmore optimizedfashion,if additional expertiseisgainedpertainingtoitsoperation:analysis,resultinterpretation,aswell asbuildingreport extensions.Itisworthmentioning,thatall the datacollectedinthe Flopsardatabase are available toour developersthroughadedicatedAPI. Are processand factor complexityconsideredlimitationsforthe applicationperformance visualizationmethodproposedby Flopsar? If so, how can this be circumvented? Most probablyeveryone,whowaseverresponsible forITsystemperformance optimization,faced uncertainty,whetherthe systemoperatesthe same waybetweenmeasurements,asduring measurements. Thisistypical forsystems,whereperformance ismeasuredatestablishedtimeperiods. Flopsaranalyzeseveryoperationwithinthe system.If we donotfilterparticularcallsina so-called- galaxy,everypointrepresentsone systemcall.If the processesperformedare of highcomplexity,we are forcedto operate ona large numberof geometricallycorrelatedpoints. Insuchcase data analysis requiresverifyingparticularcallsamongstalargernumberof those measuredandpresented. Thismight become a limitationdue tothe speedof dataanalysisbyan expert. Itmayalso adverselyimpactthe applicationserverloaddue toFlopsarcollectingdata. Thiscan be circumvented,if we utilizetechniques to exclude particularcalls, whichare outside ourinterest. Itispossible toachieve atthe system administrationlevel,whichenablesmonitoringtobe developedindividuallyforeveryapplication. - Anothermethodtoreduce the data,whichdo notrequire analysis,isanoptionto filteroutminimum and maximumoperationtimeinthe analyzedsystem. Finally,incase of systemsworkingonseveral applicationservers,we are able tochange the pointcolorsdependingonthe server. Ibelieve,thatit wouldbe useful,if there wasan optiontodefine itemcolorsinacustomfashion,e.g.basedonthe type of systemoperationoronthe executiontime.