SlideShare a Scribd company logo
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
DOI: 10.5121/ijccsa.2021.11603 19
A NOVEL OPTIMIZATION OF CLOUD INSTANCES
WITH INVENTORY THEORY APPLIED ON REAL
TIME IOT DATA OF STOCHASTIC NATURE
Mr. Sayan Guha
Data Architect, AI and Analytics Practice, Cognizant Technology Solutions, India
ABSTRACT
A Horizontal scaling is a Cloud architectural strategy by which the number of nodes or computers
increased to meet the demand of continuously increasing workload. The cost of compute instances
increases with increased workload & the research is aimed to bring an optimization of the reserved Cloud
instances using principles of Inventory theory applied to IoT datasets with variable stochastic nature. With
a structured solution architecture laid down for the business problem to understand the checkpoints of
compute instances – the range of approximate reserved compute instances have been optimized &
pinpointed by analysing the probability distribution curves of the IoT datasets. The Inventory theory
applied to the distribution curves of the data provides the optimized number of compute instances required
taking the range prescribed from the solution architecture. The solution would help Cloud solution
architects & Project sponsors in planning the compute power required in AWS® Cloud platform in any
business situation where ingestion & processing data of stochastic nature is a business need.
KEYWORDS
AWS, Amazon Web Services, Inventory Theory, Reserved Instance, On-Demand Instance, Uniform
Distribution, Gaussian Distribution, Poisson Distribution.
1. INTRODUCTION
The background of doing the research lies in the works of Sidney Brown et al [1] in his paper
concerned with (r, q) inventory model interpreted where demand accumulates continuously the
demand rate at any instant is determined by an underlying stochastic process.
Also, Andrea Nodari [2] in his master’s degree thesis has aimed to answer the few research
questions, one of them being the modelling the cost optimization in Cloud Computing with
Inventory theory. I propose to apply this cost optimization theory into real-time datasets & for the
same have chosen data from IoT devices which are stochastic & varying in nature. My objective
in the paper has been to understand the background of the solution architecture required to ingest,
process & store IoT datasets into AWS® platform & then optimize the same by interpreting the
distribution curves of IoT data captured on Cloud.
The technical challenge to the business problem lies in interpreting IoT data probability
distribution curves on which Inventory theory has been applied & interpreted of the distribution
to optimize the compute instance (e.g., uniform distribution, Gaussian distribution curves being
interpreted in my work with the data captured from IoT devices)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
20
I have considered Amazon Web Services as the chosen Cloud provider and to establish the
concepts of Inventory theory applicability in optimizing the number of Amazon Elastic Compute
Cloud (EC2) computing resources.
In this paper, my aim was to perform a validation exercise using a dataset emanated from an IoT
device(s) and capture the many iterations of the data in a database to validate the stochastic
nature of the source of data & understanding the distribution of the data & I believe this research
will provide a foundation on which a dynamically time continuous stochastic data ecosystem can
be assessed and provided with the optimized plan for instances.
The sections of the paper are as follows. In Section 2, I have provided background research work
already established in this field. In Section 3, provides a brief background of different probability
distributions, knowing which would help to understand my use case in the experiment section. In
Section 4, my aim is to provide the high-level solution approach to the business problem. In
Section 5, I have provided, the solution design using AWS® services which helps me
approximately compute the expected range of Reserved Instances required for the business
problem in Section 6.
In Section 7, the experiment scenario is discussed with the approach denoting decision points of
data results. Section 8, I have discussed the experiments results applied the Inventory theory on
data visualization of the dataset for experiment after concluding the response of the data in terms
of probability distribution of the response obtained from data. Section 9 the optimization applied
to the results obtained for the 2 different IoT sources used for the experiment. Section 10 talks
about my inference from the experiment and in Section 11, I have discussed the areas of
improvement of my work. The summarization of my work in Section 12.
2. RELATED WORK
With reference to the application of Inventory Theory applied in Cloud Economics by eminent
research work in the field contributed by Andrea Nodari [2] on his thesis paper derived the
relationship between Reserved Instances and On-Demand instances to optimize the usage in
Cloud platform of Amazon Web Services. This paper has considered Amazon® Web Services as
the preferred cloud platform for the work.
To set a background of this work from where the application of Inventory Theory can be applied
to our experiment, I have added the basic difference of the Reserved & On-Demand instances
available with Amazon® in the cloud platform of Amazon® Web Services.
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
21
Table 1: Difference between Reserved & On Demand EC2 Instances Source: Derived from Website,
https://aws.amazon.com/
Demand
perspective
EC2 instance
type
Requirement
Feasibility
Demand is
predetermined
or agile
RESERVED
UPFRONT Predetermined
HOURLY Predetermined
AD-HOC X
ON-
DEMAND
UPFRONT X
HOURLY Ad-hoc
AD-HOC Ad-hoc
Cost
Perspective
EC2 instance
type
Requirement
Feasibility
Cost &
budget
RESERVED
UPFRONT Cheaper
HOURLY Cheaper when predetermined
AD-HOC X
ON-
DEMAND
UPFRONT X
HOURLY Cheaper when Ad-hoc
AD-HOC Cheaper when Ad-hoc
In the Inventory theory, for the sake of having an understanding between the reserved instances
and on demand instances it has been considered.
y = Number of purchased reserved instances
D = Random variable representing the hourly demand of instances
di = ith observation of the demand D
Cri (Count of Reserved Instances) <Cod (Count of On-demand Instances)
Total Cost = ⅀(Cri y + Cod max {0, di-y})
= ⅀Cod *di (when the company does not purchase any reserved instance)
= ⅀Cri* y (conversely when the purchase of reserve instance is more)
i=1
N
i=1
N
i=1
N
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
22
Explanation
I assumed cost express in terms of random variable D. To minimize the total cost, the goal of the
next step is to find the optimal value for y, the number of reserved instances to purchase. Hence
the cost with demand D and y
C (D, y) = Cri y + Cod max {0, D − y}
C (D, y) is a random variable it is possible to calculate the cost as
C(y) = E [C (D, y)] = ⅀(Cri y + Cod max {0, d − y}) PD(d)….……(i)
= Cri y ⅀ Cod (d-y) PD(d)…………………………(ii)
I have approximated the discrete random variable D with a continuous random variable such that
ΨD (ξ)=Probability density function of D
ΦD (a) = Cumulative distribution function D and the total cost is expressed as
=∫ 𝜓𝐷(𝜉) ⅆ𝜉
𝑎
0
……………………………………………..….…. (iii)
The total cost from expression…(i) now expressed as
C(y)= E [C (D, y)] =∫ 𝑪 (𝝃, 𝒚) 𝜳𝑫(𝝃) 𝒅𝝃
∞
𝟎
=∫ Cri y + Cod max{0,d − y}) 𝜳𝑫(𝝃) 𝒅𝝃
∞
𝟎
=Cri y ∫ Cod (𝝃 − 𝐲) 𝜳𝑫(𝝃) 𝒅𝝃
∞
𝟎
………………….... (iv)
To minimize the above equation quiet expectedly I have taken a derivative of expression…(iv)
and set it to zero which has been further derived to
dC (y) / dy =Cri − Cod +Cod ∫ 𝜳𝑫(𝝃) 𝒅𝝃
𝒚
𝟎
= 0
= Cri –Cod (1-∫ 𝜳𝑫(𝝃) 𝒅𝝃
𝒚
𝟎
) =0
= Cri –Cod [1 − ΦD(y)] = 0……. (v) [as applied from above ΦΦ(D)
ΦD(y) = (Cod – Cri) / Cod…………………………………..…(iv)
In subsequent sections of this paper, I have applied the above derivation and relationship between
cost of on demand instances and Cost of reserve instances & used the same into IoT dataset used
for the experiment
d=0
∞
d=y
∞
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
23
3. PROBABILITY DISTRIBUTIONS: A BACKGROUND
The theoretical background of 3 different probability distributions in the below table. Since the
nature of the IoT data considered for the experiment henceforth is not pre-deterministic but
stochastic in nature, therefore a background of understanding different probability distributions
will help to interpret the dataset under consideration when the data is visualized in subsequent
sections.
Table 2: Definition & example(s) of different probability distributions e.g., Uniform, Gaussian & Poisson
distribution
Type of
distribution
Definition Mathematical
derivation
Visualization
Uniform A continuous
probability
distribution with
likelihood of
occurrence of
concerned events
is equal
F(x) = 0 when x < a
= 1/ (b-a)
when a ≤ x≤ b
= 0 when x>b
F(x) defined
between 2 points a
& b along the
abscissa
Gaussian A continuous
probability
distribution for a
real valued
random formula
𝑭(𝒙)
=
𝟏
𝒙𝟐√𝟐𝛑
ⅇ
−
𝟏
𝟐
(
𝒂−𝝁
𝒙𝟐 )
x= standard
deviation
µ = mean or
expectation of the
distribution
𝒙𝟐
= variance of
distribution
Poisson A continuous
probability
function which
determines the
likelihood of
number of events
happening in a
tenure of time
𝑭(𝒙) =
(
𝛌𝒙
𝒙!
)
ⅇ−𝝀
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
24
4. SOLUTION APPROACH
I propose below solution approach to the problem statement of optimization of Cloud instances
considering an IoT data of stochastic distribution nature.
In the subsequent section the experiment would be discussed to arrive to the results for the
dataset considered for experiment.
Figure 1. Solution approach flowchart with numbers denoting steps in the experiment (streams on step -3
are executed in parallel)
5. SOLUTION DESIGN
The dataset for the experiment to follow has been taken from open-source data repository
available with
5.1. Source of Data
Dataset contains Beach Water Quality – Automated Sensors.
(https://data.world/cityofchicago/beach-water-quality-automated-sensors)
5.2. Architecture Requirement
 Capture the IoT data from the 3 IoT sensor starting from the edge location.
 Store the ‘hot’ data & make it available for analysts /operational users.
 Feed the data through stream analytics system and make it available for any web
application to access the data.
 Ensure high availability& security using Virtual private Cloud environment.
5.3. Solution Architecture Diagram
I have visualized a solution architecture as below to capture the data from the above dataset &
catering to requirement & in subsequent Table 3 have referred the points in the architecture
where compute resources would be required.
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
25
Preferred Cloud Platform – Amazon ® Web Services
Figure 2: Preferred Solution architecture on Amazon® Web Service platform (architectural logos
reference www.amazon.com)
6. RANGE COMPUTATION OF INSTANCES
The purpose of the solution architecture in this context was to determine the range of number of
reserved instances that would be required to meet the business use case and which will be
validated while applying the inventory theory in subsequent section of experiment based on the
approximated number of reserved compute instances from Table-3 below.
6.1. Considerations of Computation
1. 2 separate Compute instances will be provisioned either by AWS managed service or by
user of 2 separate IoT device to capture Beach Water Temperature & Turbidity at every
checkpoint of compute.
2. 2-3 Availability zone is considered for disaster recovery or high availability which would
replicate the compute instance across different geographical zones & provide a range of
compute instances
3. The compute services used in the architecture are serverless, hence need not be
provisioned or maintained separately, AWS underlying architecture would provision &
maintain the EC2 (Elastic Compute cloud instances- however, instances will be
provisioned for computation & that would be the basis for validating the experiment in
subsequent sections
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
26
Below results are obtained from looking into the solution from a bird’s eye view of the data
architecture.
Table3: Range Computation of Provisioned Compute Instances based on the preferred Solution
Architecture
Architectural Viewpoint of Range Computation of Instances
Source
of
Data
Compute
Checkpoint
No
Compute
Checkpoint
Service
Name
Functionality No of
Instances
(A)
No-
Multizone
Availability
(B)
Approximate
number of
Compute
resources
(A *B)
IoT C1 Compute
instances
managed in
AWS
Greengrass
(Managed
service)
Capture & stream
IoT data for
device 1 & 2
2 NA 2
IoT C2 Compute
instances
provisioned
by AWS
directing
stream data
Direct Stream IoT
Data to S3 object
storage
2 2-3 4-6
IoT C3 AWS
Lambda
(Managed
service)
Provisioned
Compute –
Function/ Code to
ingest stream data
2 2-3 4-6
IoT C4 Instances
provisioned
for RDS
service
Stores data for
further analysis &
querying purpose
2 2-3 4-6
Approximate Range of Compute Instances 14-20
Explanation of above Range approximation
 Checkpoint C1, C2, C3 & C4 corresponds to my solution architecture in Figure 4.
Corresponding to the compute instances managed by AWS Greengrass, compute capacity
used to direct the data to AWS S3, provisioned compute- Function/ Code to ingest stream
data in AWS Lambda managed service & processing the final data set to RDS
respectively.
 2 separate IoT device to capture Beach Water Temperature & Turbidity at every
checkpoint of compute & hence compute instance (whether managed by AWS or
manually provisioned) have been assumed to be 2 for all checkpoints & hence with 4
compute checkpoints -I approximate the probable usage of 8 instances which need to be
reserved upfront for one availability zone
 At the Edge Computing zone, AWS Greengrass doesn’t need the data to be available
across availability zones but for all other compute points data need to high available
which I propose to be 2-3 multizone availability based on requirement of consumption.
 Therefore, I have done a range approximation of Compute instances required in the
solution as per the Table 3 considering only multizone availability of 6 compute
instances (C2, C3, & C4) with 2-3 multizone availability which makes it 12-18 compute
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
27
instances+ 2 instances considered at edge with AWS Greengrass makes the range as 14-
20 compute instances to be reserved upfront from solution architect point of view.
When I moved to the subsequent section of the paper to interpret the IoT data from a probability
distribution perspective, my goal was to optimize the range as already computed above & specify
the actual number of instances required in this process of interpretation of data using Inventory
theory applied to response probability distribution of the datasets.
The value adds, I proposed will support the contextual solution architecture approximation of
compute resources assume a range of compute instances at the very high level of solution based
on assumptions to streamline & pinpoint the number of instances by interpretation of actual data
response applying Inventory theory applied to their probability distributions.
7. EXPERIMENT SCENARIO: PROBABILITY DISTRIBUTION OF DATA
SOURCE
Aiping Wang et al [3] in their publication of Survey on stochastic distribution systems found the
the control task is to obtain control signals so that the output Probability Distribution functions of
stochastic systems are made to follow their target Probability Distribution functions The research
in this field already established by predecessors me to formulate the probability distribution
function for the IoT data under consideration as the initial step.
I have observed with solution architecture defined for the ingestion of IoT data from 2 devices
specific to the use case could provide a high-level estimation of compute resources required for
the requirement under consideration. In this context, would like to clarify that the 2 IoT devices
would capture 2 specific attributes of the data generated at the site namely 1) Water temperature
& 2) Turbidity.
In this process of performing the experiment I have analysed the work of D. Altman et al [4]
which took samples of various probability data distributions samples & interpreted it through
various methods of analysis make assumptions about normality, including correlation, regression,
tests, and analysis of variance. It has also been concluded is not in fact necessary for the
distribution of the observed data to be normal, but rather the sample values should be compatible
having a normal distribution in defined range of values.
Peter Jez [5] in his technical paper compared multiple data samples and carried on a series of
mathematical test to conclude, mathematically Normal Distribution would be range bounded at
its boundaries but normal distribution in Gaussian density distribution will have a mean or
expectation of the value exponentially raised and divisible by the variance of distribution.
Based on all the above studies in the field of probability distribution -the below approach has
been devised to arrive to the interpretation of the results as per the flow chart.
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
28
Figure 3: Experiment Approach flowchart with numbers denoting decision points of interpretation of data
results in the experiment
8. EXPERIMENT RESULTS: DATA VISUALIZATION
The experiment results visualizing the data output captured for Water Temperature & Turbidity
of Beach water in Ohio shows
 IoT Sensor1 Output (Water Temperature): As observed, the temperature is range-
bound 12 degrees Fahrenheit to 27 degrees Fahrenheit with a mean (µ)= 42 around
within the range of 19-20 degrees. Instances were captured real time for7500 timestamp
instances from IoT sensor 1, ingested in the RDS database instances and presented as
below Figure -4.
 IoT Sensor1 Output (Turbidity): As observed, the turbidity readings were slowly
taking peak as observed around the value of 42 Formazin Turbidity Unit (FTU) &
gradually declining with being normalized – the distribution fairly forms a bell-curve
along the value of mean (µ)= 42 with standard deviations (-x) ~ 25 FTU & standard
deviation (+x) ~25 FTU equally spaced from the mean. Trend lines have been drawn
trend lines along both the observations.
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
29
Figure 4: Data results in the experiment-Depiction of Water Temperature & Turbidity captured across
timestamp ranges (Left – Water temperature Vs Timestamp, Right- Turbidity Vs Timestamp)
Inference
 The probability distribution of IoT device1 is inferred as Normal Distribution
 The probability distribution of IoT device1 is inferred as Gaussian Distribution
I have taken an assumption to give equal weightage to the IoT data source considering the
requirement of Cloud Compute resource while applying Inventory Theory for optimization of
forecasted range of computer resources.
The end goal is optimizing the total number of compute resources possibly could be required in
this process of capturing IoT real time data of stochastic nature with the help of Inventory theory
discussed in Section-2 of this paper & the baseline range would be considered as calculated from
Section-6, Figure -3
9. EXPERIMENT RESULT: OPTIMIZATION BY APPLYING INVENTORY
THEORY
The baseline range outlined would be considered as calculated from Section-3, Figure -3
considering approximately 2-3 availability zones & the solutions has been architected using the
solution architecture in Figure-2. The services invoked in the solution are in many cases managed
services of Amazon ® Web Services but each of the services on the background invokes
instances for the best suitability of the IoT data processing under consideration.
The best of the solution architecture design could be done with an approximation with the
solution architecture under consideration which have considered standard well architected
framework of Amazon ® Web Services particularly suited for the requirement Luis A. San-José
et al [6] have used the Inventory theory to maximize profit in an inventory system of Time
varying demand, I have applied the same concept of applying Inventory theory into time varying
data to maximize resource utilization.
I have investigated the problem statement from the data perspective by which applied statistical
analysis & pure probability distribution by interpreting the nature of the data & output produced
from the IoT sensors as depicted and visualized in Figure-4.
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
30
9.1. Optimization of Instances for IoT Sensor 1
Assuming equal weightage on distribution of compute instances baselined in Section-3 I have
considered 50% of the instances (7-10 instances) to be consumed to process the IoT Sensor 1
data which measures Water Temperature Vs Timestamp.
The output has been inferred to be producing a Normal Probability Distribution
Hence, the probability distribution function & the cumulative distribution function in the interval
of [m, n] can be defined as
Density function:
Ψ(ξ) = 0 if ξ < m
Ψ(ξ) = 1/ (n−m) if m ≤ ξ ≤ n
Ψ(ξ) = 0 if ξ >n
Cumulative Distribution:
ΦD(y) = 0 if y < m
ΦD(y) = (y−m)/ (n−m) if a ≤ y ≤ b
ΦD(y) = 1 if y >n
 Assuming the optimal number of reserved instances will be in the range of 7 ≤ y ≤ 10
 Assuming the consideration of partial upfront plan of AWS® with hourly rate for on-
demand & reserve instances of m3. medium instances in us region (us-east1-a or us-
west1-b) – Cost /hr (On demand) =$ 0.07 & Cost/hr (Reserved)=$ 0.05
ΦD(y) = (Cod – Cri) / Cod
(y-7)/ (10-7) = (0.07-0.05)/ 0.07 = 7.85
Y ~ 8 Instances………. (A)
8.2. Optimization of Instances for IoT Sensor 2
Assuming equal weightage on distribution of compute instances baselined in Section-3 I have
considered 50% of the instances (7-10 instances) to be consumed to process the IoT Sensor 1
data which measures Turbidity Vs Timestamp.
The output has been inferred to be producing a Gaussian Probability Distribution
Following similar approach like above from Jamie Zappone[6] outlined that the probability
distribution function & the cumulative distribution function in the interval of [m, n] can be
defined as
Density function:
Ψ(ξ)=
𝟏
𝝈√𝟐𝜫
ⅇ−( 𝒙−𝝁′)𝟐
𝟐𝝈𝟐
Cumulative Distribution:
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
31
ΦD(y) =
𝟏
√𝟐𝜫
𝝈 ∫ ⅇ
−(𝒕−𝝁)𝟐
𝟐𝝈𝟐 𝒅𝒕
𝒚
∞
 Assuming the optimal number of reserved instances will be in the range of 7≤ y ≤ 10
 Assuming the consideration of partial upfront plan of AWS® with hourly rate for on-
demand & reserve instances of m3. medium instances in US region (us-east1-a or us-
west1-b) – Cost /hr (On demand) =$ 0.07 & Cost/hr (Reserved) =$ 0.05
 The Mean (µ) is observed to be 42, standard deviation ( 𝜎) =60 within the range of
normalization of Min value =2000 and Max value =4000 in Figure 4 which calculates the
Cumulative Distribution function (ΦD(y)) for Gaussian distribution ~ 0.285 when
the upper limit of integration (y) =8 [ Calculated using Cumulative Dstribution
Calculator for Gaussian distribution]
Referring to the Cost Optimization obtained from Inventory Theory model as below
ΦD(y) = (Cod – Cri) / Cod
ΦD(y) = (0.07-0.05) / 0.07 = 0.285 is true when
y ~ 8 Instances………. (B)
The total number of reserved instances applying Inventory theory model for optimization
=(A) +(B) =16.
10. EXPERIMENT INFERENCE
 In this work I have predicted an estimated number of total reserved compute capacity
expected to process the IoT data from the site location emanating from 2 IoT sensors
with a stochastic & unpredictable nature of data propagation to be in the range of 14-20
instances depending on the number multi-availability zone under consideration.
 I have intended to perform the experiment on the Beach water data and interpreted the
probability distribution of turbidity and water temperature of the beach water data. As
observed the water temperature results could be interpreted as uniform probability
distribution & the turbidity as Gaussian distribution for thousands of instances of
captured for every event. The Inventory theory model for optimization of reserved Cloud
Compute resources was hence applied and found that number of instances optimized by
Inventory model application is streamlined to (8+8) =16 instances which falls correctly
into the range of instances predicted & assumed during solution architectural approach.
11. LIMITATIONS & AREAS OF IMPROVEMENT
While in the process of background study I analysed Kenneth J. Arrow et al [7] where optimal
inventory policies have been discussed with constant & variable rates of changes, I took my
understanding to model the solution of variable data inflow in an idealized rather than an exact
representation of the real problem. Hence my results do not guarantee the solution to be the best
possible solution, rather an approach with real time data to achieve an optimized prediction of
reserved compute instances & would lead to cost optimization with upfront plan of reserved
instances.
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
32
Abayomi et al [8] discussed in their research paper on Dynamics of Inventory Cost Optimization
the application of Inventory Theory model applied to both deterministic and stochastic demands
& thereby arrived to find out the optimised stock procurement approach when applied in context
of Supply chain. I have however, not performed any experiment to apply the Inventory Model to
determine the number of reserved compute instances in any scenario where there is a constant
deterministic in nature rather the concept of Inventory Model Cost optimization has been applied
to stochastic data emanating from IoT & captured in AWS® cloud architecture to predict a
baseline of the number of reserved compute instances &further optimized using the distribution
output of the data sources.
Hence, I acknowledge the application of the same solution approach into data sources of constant
deterministic data is a subject of further research.
12. CONCLUSION
I expect that my work to be a connecting between all the research work established to optimize
the reserved capacity cost optimization in Cloud Computing. With real data obtained from IoT
devices which have an uncertainty stochastic nature of propagation, I have intended to simulate a
solution architecture to capture such data to predict a range of compute capacity required for
flow, propagation & storage & for the same opted AWS® Cloud platform services. In my
endeavour to further interpret the dataset into probability distribution curves I have applied Cost
optimization from Inventory theory model to identify the optimized number of reserved instances
required for the experiment performed which was observed to be falling in the range of instances
predicted by solution approach.
I, trust this work will motivate upcoming avenues of future research where data is stochastic &
real time, data driven approach for cost estimation of compute power is necessary and budget for
the same need to be predetermined with less dependence on the Cloud service provider to provide
the provisioning plans.
REFERENCES
[1] Sidney Browne and Paul Zipkin (1991) “Inventory Models With Continuous, Stochastic Demands”,
https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/6353/browne_inventory_models.pdf
[2] Andrea Nodari (2015) “Cost Optimization in Cloud
Computing”https://aaltodoc.aalto.fi/bitstream/handle/123456789/17711/master_Nodari_Andrea_1970
.pdf?sequence=1
[3] Aiping Wang and Hong Wang (2021) “Survey on stochastic distribution systems: A full probability
density function control theory with potential applications” https://www.ornl.gov/publication/survey-
stochastic-distribution-systems-full-probability-density-function-control-theory
[4] D. Altman & J. Bland (1995) “ Statistics notes: The normal distribution”
https://pubmed.ncbi.nlm.nih.gov/7866172/
[5] Peter Jez (2012) “Uniform Distribution with respect to Gaussian measure”
http://www.cosy.sbg.ac.at/research/tr/2012-02_Jez.pdf
[6] Luis A. San-José, Joaquín Sicilia, Manuel González-de-la-Rosa & Jaime Febles-Acosta (2021)
“Profit maximization in an inventory system with time-varying demand, partial backordering and
discrete inventory cycle” https://link.springer.com/article/10.1007/s10479-021-04161-6
[7] Kenneth J. Arrow, Theodore Harris and Jacob Marschak (1958) “Optimal Inventory Policy”
https://www.or.mist.i.u-tokyo.ac.jp/takeda/FreshmanCourse/ArrowHarrisMarschak.pdf
[8] T Abayomi, Onanuga, Adekunle Adeyemi (2014) “Dynamics of Inventory Cost Optimization-A
Review of Theory and Evidence”
https://www.iiste.org/Journals/index.php/RJFA/article/view/17593/0
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
Vol. 11, No. 1/2/3/4/5/6, December 2021
33
AUTHOR
Sayan Guha completed his Bachelors in Electronics & Communication Engineering
in 2006. He has been serving Information Technology industry supporting Data
Modelling & Architecture across business domains of Retail, Banking, and Insurance
& Telecom. He is currently working with Cognizant technology Solutions and his area
of interests are in the fields of Cloud Solution architecture, Big Data Integration, API
based data integration & Advanced analytics.

More Related Content

PDF
A frame work for clustering time evolving data
PDF
Premeditated Initial Points for K-Means Clustering
PDF
Reduct generation for the incremental data using rough set theory
DOC
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
PDF
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
PDF
50120140505013
PDF
Query evaluation over network of data aggregators
PDF
K-means Clustering Method for the Analysis of Log Data
A frame work for clustering time evolving data
Premeditated Initial Points for K-Means Clustering
Reduct generation for the incremental data using rough set theory
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
50120140505013
Query evaluation over network of data aggregators
K-means Clustering Method for the Analysis of Log Data

What's hot (19)

PDF
Analysis and implementation of modified k medoids
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
PPTX
Master defense presentation 2019 04_18_rev2
PDF
IJSETR-VOL-3-ISSUE-12-3358-3363
PDF
The International Journal of Engineering and Science (The IJES)
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
PDF
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
PDF
Az36311316
PPTX
Clustering on database systems rkm
PDF
Parallel kmeans clustering in Erlang
PDF
Current clustering techniques
PDF
Optimising Data Using K-Means Clustering Algorithm
PDF
Big data Clustering Algorithms And Strategies
PDF
Bj24390398
PPT
Lect4
PDF
Unsupervised learning clustering
PDF
Scalable and efficient cluster based framework for multidimensional indexing
PDF
Scalable and efficient cluster based framework for
Analysis and implementation of modified k medoids
Welcome to International Journal of Engineering Research and Development (IJERD)
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
Master defense presentation 2019 04_18_rev2
IJSETR-VOL-3-ISSUE-12-3358-3363
The International Journal of Engineering and Science (The IJES)
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
Az36311316
Clustering on database systems rkm
Parallel kmeans clustering in Erlang
Current clustering techniques
Optimising Data Using K-Means Clustering Algorithm
Big data Clustering Algorithms And Strategies
Bj24390398
Lect4
Unsupervised learning clustering
Scalable and efficient cluster based framework for multidimensional indexing
Scalable and efficient cluster based framework for
Ad

Similar to A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real Time IoT Data of Stochastic Nature (20)

PDF
Introducing Novel Graph Database Cloud Computing For Efficient Data Management
PDF
A detailed analysis of the supervised machine Learning Algorithms
PDF
Volume 2-issue-6-1930-1932
PDF
Volume 2-issue-6-1930-1932
PDF
Performance aware algorithm design for elastic resource workflow management o...
PDF
Finding Relationships between the Our-NIR Cluster Results
PDF
Qo s aware scientific application scheduling algorithm in cloud environment
PDF
Qo s aware scientific application scheduling algorithm in cloud environment
PDF
PDF
J41046368
PDF
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
PDF
IRJET- Distributed Resource Allocation for Data Center Networks: A Hierar...
PDF
Multicloud Deployment of Computing Clusters for Loosely Coupled Multi Task C...
PDF
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
PDF
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
PDF
Development Infographic
PDF
Demand-driven Gaussian window optimization for executing preferred population...
PDF
Efficient Storage of Heterogeneous IoT data in a Blockchain using an Indexing...
PDF
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
PDF
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Introducing Novel Graph Database Cloud Computing For Efficient Data Management
A detailed analysis of the supervised machine Learning Algorithms
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
Performance aware algorithm design for elastic resource workflow management o...
Finding Relationships between the Our-NIR Cluster Results
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
J41046368
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
IRJET- Distributed Resource Allocation for Data Center Networks: A Hierar...
Multicloud Deployment of Computing Clusters for Loosely Coupled Multi Task C...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
Development Infographic
Demand-driven Gaussian window optimization for executing preferred population...
Efficient Storage of Heterogeneous IoT data in a Blockchain using an Indexing...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Ad

Recently uploaded (20)

PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
introduction to datamining and warehousing
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPT
Occupational Health and Safety Management System
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
introduction to datamining and warehousing
Information Storage and Retrieval Techniques Unit III
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Current and future trends in Computer Vision.pptx
UNIT 4 Total Quality Management .pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Occupational Health and Safety Management System
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
Fundamentals of safety and accident prevention -final (1).pptx
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt

A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real Time IoT Data of Stochastic Nature

  • 1. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 DOI: 10.5121/ijccsa.2021.11603 19 A NOVEL OPTIMIZATION OF CLOUD INSTANCES WITH INVENTORY THEORY APPLIED ON REAL TIME IOT DATA OF STOCHASTIC NATURE Mr. Sayan Guha Data Architect, AI and Analytics Practice, Cognizant Technology Solutions, India ABSTRACT A Horizontal scaling is a Cloud architectural strategy by which the number of nodes or computers increased to meet the demand of continuously increasing workload. The cost of compute instances increases with increased workload & the research is aimed to bring an optimization of the reserved Cloud instances using principles of Inventory theory applied to IoT datasets with variable stochastic nature. With a structured solution architecture laid down for the business problem to understand the checkpoints of compute instances – the range of approximate reserved compute instances have been optimized & pinpointed by analysing the probability distribution curves of the IoT datasets. The Inventory theory applied to the distribution curves of the data provides the optimized number of compute instances required taking the range prescribed from the solution architecture. The solution would help Cloud solution architects & Project sponsors in planning the compute power required in AWS® Cloud platform in any business situation where ingestion & processing data of stochastic nature is a business need. KEYWORDS AWS, Amazon Web Services, Inventory Theory, Reserved Instance, On-Demand Instance, Uniform Distribution, Gaussian Distribution, Poisson Distribution. 1. INTRODUCTION The background of doing the research lies in the works of Sidney Brown et al [1] in his paper concerned with (r, q) inventory model interpreted where demand accumulates continuously the demand rate at any instant is determined by an underlying stochastic process. Also, Andrea Nodari [2] in his master’s degree thesis has aimed to answer the few research questions, one of them being the modelling the cost optimization in Cloud Computing with Inventory theory. I propose to apply this cost optimization theory into real-time datasets & for the same have chosen data from IoT devices which are stochastic & varying in nature. My objective in the paper has been to understand the background of the solution architecture required to ingest, process & store IoT datasets into AWS® platform & then optimize the same by interpreting the distribution curves of IoT data captured on Cloud. The technical challenge to the business problem lies in interpreting IoT data probability distribution curves on which Inventory theory has been applied & interpreted of the distribution to optimize the compute instance (e.g., uniform distribution, Gaussian distribution curves being interpreted in my work with the data captured from IoT devices)
  • 2. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 20 I have considered Amazon Web Services as the chosen Cloud provider and to establish the concepts of Inventory theory applicability in optimizing the number of Amazon Elastic Compute Cloud (EC2) computing resources. In this paper, my aim was to perform a validation exercise using a dataset emanated from an IoT device(s) and capture the many iterations of the data in a database to validate the stochastic nature of the source of data & understanding the distribution of the data & I believe this research will provide a foundation on which a dynamically time continuous stochastic data ecosystem can be assessed and provided with the optimized plan for instances. The sections of the paper are as follows. In Section 2, I have provided background research work already established in this field. In Section 3, provides a brief background of different probability distributions, knowing which would help to understand my use case in the experiment section. In Section 4, my aim is to provide the high-level solution approach to the business problem. In Section 5, I have provided, the solution design using AWS® services which helps me approximately compute the expected range of Reserved Instances required for the business problem in Section 6. In Section 7, the experiment scenario is discussed with the approach denoting decision points of data results. Section 8, I have discussed the experiments results applied the Inventory theory on data visualization of the dataset for experiment after concluding the response of the data in terms of probability distribution of the response obtained from data. Section 9 the optimization applied to the results obtained for the 2 different IoT sources used for the experiment. Section 10 talks about my inference from the experiment and in Section 11, I have discussed the areas of improvement of my work. The summarization of my work in Section 12. 2. RELATED WORK With reference to the application of Inventory Theory applied in Cloud Economics by eminent research work in the field contributed by Andrea Nodari [2] on his thesis paper derived the relationship between Reserved Instances and On-Demand instances to optimize the usage in Cloud platform of Amazon Web Services. This paper has considered Amazon® Web Services as the preferred cloud platform for the work. To set a background of this work from where the application of Inventory Theory can be applied to our experiment, I have added the basic difference of the Reserved & On-Demand instances available with Amazon® in the cloud platform of Amazon® Web Services.
  • 3. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 21 Table 1: Difference between Reserved & On Demand EC2 Instances Source: Derived from Website, https://aws.amazon.com/ Demand perspective EC2 instance type Requirement Feasibility Demand is predetermined or agile RESERVED UPFRONT Predetermined HOURLY Predetermined AD-HOC X ON- DEMAND UPFRONT X HOURLY Ad-hoc AD-HOC Ad-hoc Cost Perspective EC2 instance type Requirement Feasibility Cost & budget RESERVED UPFRONT Cheaper HOURLY Cheaper when predetermined AD-HOC X ON- DEMAND UPFRONT X HOURLY Cheaper when Ad-hoc AD-HOC Cheaper when Ad-hoc In the Inventory theory, for the sake of having an understanding between the reserved instances and on demand instances it has been considered. y = Number of purchased reserved instances D = Random variable representing the hourly demand of instances di = ith observation of the demand D Cri (Count of Reserved Instances) <Cod (Count of On-demand Instances) Total Cost = ⅀(Cri y + Cod max {0, di-y}) = ⅀Cod *di (when the company does not purchase any reserved instance) = ⅀Cri* y (conversely when the purchase of reserve instance is more) i=1 N i=1 N i=1 N
  • 4. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 22 Explanation I assumed cost express in terms of random variable D. To minimize the total cost, the goal of the next step is to find the optimal value for y, the number of reserved instances to purchase. Hence the cost with demand D and y C (D, y) = Cri y + Cod max {0, D − y} C (D, y) is a random variable it is possible to calculate the cost as C(y) = E [C (D, y)] = ⅀(Cri y + Cod max {0, d − y}) PD(d)….……(i) = Cri y ⅀ Cod (d-y) PD(d)…………………………(ii) I have approximated the discrete random variable D with a continuous random variable such that ΨD (ξ)=Probability density function of D ΦD (a) = Cumulative distribution function D and the total cost is expressed as =∫ 𝜓𝐷(𝜉) ⅆ𝜉 𝑎 0 ……………………………………………..….…. (iii) The total cost from expression…(i) now expressed as C(y)= E [C (D, y)] =∫ 𝑪 (𝝃, 𝒚) 𝜳𝑫(𝝃) 𝒅𝝃 ∞ 𝟎 =∫ Cri y + Cod max{0,d − y}) 𝜳𝑫(𝝃) 𝒅𝝃 ∞ 𝟎 =Cri y ∫ Cod (𝝃 − 𝐲) 𝜳𝑫(𝝃) 𝒅𝝃 ∞ 𝟎 ………………….... (iv) To minimize the above equation quiet expectedly I have taken a derivative of expression…(iv) and set it to zero which has been further derived to dC (y) / dy =Cri − Cod +Cod ∫ 𝜳𝑫(𝝃) 𝒅𝝃 𝒚 𝟎 = 0 = Cri –Cod (1-∫ 𝜳𝑫(𝝃) 𝒅𝝃 𝒚 𝟎 ) =0 = Cri –Cod [1 − ΦD(y)] = 0……. (v) [as applied from above ΦΦ(D) ΦD(y) = (Cod – Cri) / Cod…………………………………..…(iv) In subsequent sections of this paper, I have applied the above derivation and relationship between cost of on demand instances and Cost of reserve instances & used the same into IoT dataset used for the experiment d=0 ∞ d=y ∞
  • 5. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 23 3. PROBABILITY DISTRIBUTIONS: A BACKGROUND The theoretical background of 3 different probability distributions in the below table. Since the nature of the IoT data considered for the experiment henceforth is not pre-deterministic but stochastic in nature, therefore a background of understanding different probability distributions will help to interpret the dataset under consideration when the data is visualized in subsequent sections. Table 2: Definition & example(s) of different probability distributions e.g., Uniform, Gaussian & Poisson distribution Type of distribution Definition Mathematical derivation Visualization Uniform A continuous probability distribution with likelihood of occurrence of concerned events is equal F(x) = 0 when x < a = 1/ (b-a) when a ≤ x≤ b = 0 when x>b F(x) defined between 2 points a & b along the abscissa Gaussian A continuous probability distribution for a real valued random formula 𝑭(𝒙) = 𝟏 𝒙𝟐√𝟐𝛑 ⅇ − 𝟏 𝟐 ( 𝒂−𝝁 𝒙𝟐 ) x= standard deviation µ = mean or expectation of the distribution 𝒙𝟐 = variance of distribution Poisson A continuous probability function which determines the likelihood of number of events happening in a tenure of time 𝑭(𝒙) = ( 𝛌𝒙 𝒙! ) ⅇ−𝝀
  • 6. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 24 4. SOLUTION APPROACH I propose below solution approach to the problem statement of optimization of Cloud instances considering an IoT data of stochastic distribution nature. In the subsequent section the experiment would be discussed to arrive to the results for the dataset considered for experiment. Figure 1. Solution approach flowchart with numbers denoting steps in the experiment (streams on step -3 are executed in parallel) 5. SOLUTION DESIGN The dataset for the experiment to follow has been taken from open-source data repository available with 5.1. Source of Data Dataset contains Beach Water Quality – Automated Sensors. (https://data.world/cityofchicago/beach-water-quality-automated-sensors) 5.2. Architecture Requirement  Capture the IoT data from the 3 IoT sensor starting from the edge location.  Store the ‘hot’ data & make it available for analysts /operational users.  Feed the data through stream analytics system and make it available for any web application to access the data.  Ensure high availability& security using Virtual private Cloud environment. 5.3. Solution Architecture Diagram I have visualized a solution architecture as below to capture the data from the above dataset & catering to requirement & in subsequent Table 3 have referred the points in the architecture where compute resources would be required.
  • 7. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 25 Preferred Cloud Platform – Amazon ® Web Services Figure 2: Preferred Solution architecture on Amazon® Web Service platform (architectural logos reference www.amazon.com) 6. RANGE COMPUTATION OF INSTANCES The purpose of the solution architecture in this context was to determine the range of number of reserved instances that would be required to meet the business use case and which will be validated while applying the inventory theory in subsequent section of experiment based on the approximated number of reserved compute instances from Table-3 below. 6.1. Considerations of Computation 1. 2 separate Compute instances will be provisioned either by AWS managed service or by user of 2 separate IoT device to capture Beach Water Temperature & Turbidity at every checkpoint of compute. 2. 2-3 Availability zone is considered for disaster recovery or high availability which would replicate the compute instance across different geographical zones & provide a range of compute instances 3. The compute services used in the architecture are serverless, hence need not be provisioned or maintained separately, AWS underlying architecture would provision & maintain the EC2 (Elastic Compute cloud instances- however, instances will be provisioned for computation & that would be the basis for validating the experiment in subsequent sections
  • 8. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 26 Below results are obtained from looking into the solution from a bird’s eye view of the data architecture. Table3: Range Computation of Provisioned Compute Instances based on the preferred Solution Architecture Architectural Viewpoint of Range Computation of Instances Source of Data Compute Checkpoint No Compute Checkpoint Service Name Functionality No of Instances (A) No- Multizone Availability (B) Approximate number of Compute resources (A *B) IoT C1 Compute instances managed in AWS Greengrass (Managed service) Capture & stream IoT data for device 1 & 2 2 NA 2 IoT C2 Compute instances provisioned by AWS directing stream data Direct Stream IoT Data to S3 object storage 2 2-3 4-6 IoT C3 AWS Lambda (Managed service) Provisioned Compute – Function/ Code to ingest stream data 2 2-3 4-6 IoT C4 Instances provisioned for RDS service Stores data for further analysis & querying purpose 2 2-3 4-6 Approximate Range of Compute Instances 14-20 Explanation of above Range approximation  Checkpoint C1, C2, C3 & C4 corresponds to my solution architecture in Figure 4. Corresponding to the compute instances managed by AWS Greengrass, compute capacity used to direct the data to AWS S3, provisioned compute- Function/ Code to ingest stream data in AWS Lambda managed service & processing the final data set to RDS respectively.  2 separate IoT device to capture Beach Water Temperature & Turbidity at every checkpoint of compute & hence compute instance (whether managed by AWS or manually provisioned) have been assumed to be 2 for all checkpoints & hence with 4 compute checkpoints -I approximate the probable usage of 8 instances which need to be reserved upfront for one availability zone  At the Edge Computing zone, AWS Greengrass doesn’t need the data to be available across availability zones but for all other compute points data need to high available which I propose to be 2-3 multizone availability based on requirement of consumption.  Therefore, I have done a range approximation of Compute instances required in the solution as per the Table 3 considering only multizone availability of 6 compute instances (C2, C3, & C4) with 2-3 multizone availability which makes it 12-18 compute
  • 9. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 27 instances+ 2 instances considered at edge with AWS Greengrass makes the range as 14- 20 compute instances to be reserved upfront from solution architect point of view. When I moved to the subsequent section of the paper to interpret the IoT data from a probability distribution perspective, my goal was to optimize the range as already computed above & specify the actual number of instances required in this process of interpretation of data using Inventory theory applied to response probability distribution of the datasets. The value adds, I proposed will support the contextual solution architecture approximation of compute resources assume a range of compute instances at the very high level of solution based on assumptions to streamline & pinpoint the number of instances by interpretation of actual data response applying Inventory theory applied to their probability distributions. 7. EXPERIMENT SCENARIO: PROBABILITY DISTRIBUTION OF DATA SOURCE Aiping Wang et al [3] in their publication of Survey on stochastic distribution systems found the the control task is to obtain control signals so that the output Probability Distribution functions of stochastic systems are made to follow their target Probability Distribution functions The research in this field already established by predecessors me to formulate the probability distribution function for the IoT data under consideration as the initial step. I have observed with solution architecture defined for the ingestion of IoT data from 2 devices specific to the use case could provide a high-level estimation of compute resources required for the requirement under consideration. In this context, would like to clarify that the 2 IoT devices would capture 2 specific attributes of the data generated at the site namely 1) Water temperature & 2) Turbidity. In this process of performing the experiment I have analysed the work of D. Altman et al [4] which took samples of various probability data distributions samples & interpreted it through various methods of analysis make assumptions about normality, including correlation, regression, tests, and analysis of variance. It has also been concluded is not in fact necessary for the distribution of the observed data to be normal, but rather the sample values should be compatible having a normal distribution in defined range of values. Peter Jez [5] in his technical paper compared multiple data samples and carried on a series of mathematical test to conclude, mathematically Normal Distribution would be range bounded at its boundaries but normal distribution in Gaussian density distribution will have a mean or expectation of the value exponentially raised and divisible by the variance of distribution. Based on all the above studies in the field of probability distribution -the below approach has been devised to arrive to the interpretation of the results as per the flow chart.
  • 10. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 28 Figure 3: Experiment Approach flowchart with numbers denoting decision points of interpretation of data results in the experiment 8. EXPERIMENT RESULTS: DATA VISUALIZATION The experiment results visualizing the data output captured for Water Temperature & Turbidity of Beach water in Ohio shows  IoT Sensor1 Output (Water Temperature): As observed, the temperature is range- bound 12 degrees Fahrenheit to 27 degrees Fahrenheit with a mean (µ)= 42 around within the range of 19-20 degrees. Instances were captured real time for7500 timestamp instances from IoT sensor 1, ingested in the RDS database instances and presented as below Figure -4.  IoT Sensor1 Output (Turbidity): As observed, the turbidity readings were slowly taking peak as observed around the value of 42 Formazin Turbidity Unit (FTU) & gradually declining with being normalized – the distribution fairly forms a bell-curve along the value of mean (µ)= 42 with standard deviations (-x) ~ 25 FTU & standard deviation (+x) ~25 FTU equally spaced from the mean. Trend lines have been drawn trend lines along both the observations.
  • 11. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 29 Figure 4: Data results in the experiment-Depiction of Water Temperature & Turbidity captured across timestamp ranges (Left – Water temperature Vs Timestamp, Right- Turbidity Vs Timestamp) Inference  The probability distribution of IoT device1 is inferred as Normal Distribution  The probability distribution of IoT device1 is inferred as Gaussian Distribution I have taken an assumption to give equal weightage to the IoT data source considering the requirement of Cloud Compute resource while applying Inventory Theory for optimization of forecasted range of computer resources. The end goal is optimizing the total number of compute resources possibly could be required in this process of capturing IoT real time data of stochastic nature with the help of Inventory theory discussed in Section-2 of this paper & the baseline range would be considered as calculated from Section-6, Figure -3 9. EXPERIMENT RESULT: OPTIMIZATION BY APPLYING INVENTORY THEORY The baseline range outlined would be considered as calculated from Section-3, Figure -3 considering approximately 2-3 availability zones & the solutions has been architected using the solution architecture in Figure-2. The services invoked in the solution are in many cases managed services of Amazon ® Web Services but each of the services on the background invokes instances for the best suitability of the IoT data processing under consideration. The best of the solution architecture design could be done with an approximation with the solution architecture under consideration which have considered standard well architected framework of Amazon ® Web Services particularly suited for the requirement Luis A. San-José et al [6] have used the Inventory theory to maximize profit in an inventory system of Time varying demand, I have applied the same concept of applying Inventory theory into time varying data to maximize resource utilization. I have investigated the problem statement from the data perspective by which applied statistical analysis & pure probability distribution by interpreting the nature of the data & output produced from the IoT sensors as depicted and visualized in Figure-4.
  • 12. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 30 9.1. Optimization of Instances for IoT Sensor 1 Assuming equal weightage on distribution of compute instances baselined in Section-3 I have considered 50% of the instances (7-10 instances) to be consumed to process the IoT Sensor 1 data which measures Water Temperature Vs Timestamp. The output has been inferred to be producing a Normal Probability Distribution Hence, the probability distribution function & the cumulative distribution function in the interval of [m, n] can be defined as Density function: Ψ(ξ) = 0 if ξ < m Ψ(ξ) = 1/ (n−m) if m ≤ ξ ≤ n Ψ(ξ) = 0 if ξ >n Cumulative Distribution: ΦD(y) = 0 if y < m ΦD(y) = (y−m)/ (n−m) if a ≤ y ≤ b ΦD(y) = 1 if y >n  Assuming the optimal number of reserved instances will be in the range of 7 ≤ y ≤ 10  Assuming the consideration of partial upfront plan of AWS® with hourly rate for on- demand & reserve instances of m3. medium instances in us region (us-east1-a or us- west1-b) – Cost /hr (On demand) =$ 0.07 & Cost/hr (Reserved)=$ 0.05 ΦD(y) = (Cod – Cri) / Cod (y-7)/ (10-7) = (0.07-0.05)/ 0.07 = 7.85 Y ~ 8 Instances………. (A) 8.2. Optimization of Instances for IoT Sensor 2 Assuming equal weightage on distribution of compute instances baselined in Section-3 I have considered 50% of the instances (7-10 instances) to be consumed to process the IoT Sensor 1 data which measures Turbidity Vs Timestamp. The output has been inferred to be producing a Gaussian Probability Distribution Following similar approach like above from Jamie Zappone[6] outlined that the probability distribution function & the cumulative distribution function in the interval of [m, n] can be defined as Density function: Ψ(ξ)= 𝟏 𝝈√𝟐𝜫 ⅇ−( 𝒙−𝝁′)𝟐 𝟐𝝈𝟐 Cumulative Distribution:
  • 13. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 31 ΦD(y) = 𝟏 √𝟐𝜫 𝝈 ∫ ⅇ −(𝒕−𝝁)𝟐 𝟐𝝈𝟐 𝒅𝒕 𝒚 ∞  Assuming the optimal number of reserved instances will be in the range of 7≤ y ≤ 10  Assuming the consideration of partial upfront plan of AWS® with hourly rate for on- demand & reserve instances of m3. medium instances in US region (us-east1-a or us- west1-b) – Cost /hr (On demand) =$ 0.07 & Cost/hr (Reserved) =$ 0.05  The Mean (µ) is observed to be 42, standard deviation ( 𝜎) =60 within the range of normalization of Min value =2000 and Max value =4000 in Figure 4 which calculates the Cumulative Distribution function (ΦD(y)) for Gaussian distribution ~ 0.285 when the upper limit of integration (y) =8 [ Calculated using Cumulative Dstribution Calculator for Gaussian distribution] Referring to the Cost Optimization obtained from Inventory Theory model as below ΦD(y) = (Cod – Cri) / Cod ΦD(y) = (0.07-0.05) / 0.07 = 0.285 is true when y ~ 8 Instances………. (B) The total number of reserved instances applying Inventory theory model for optimization =(A) +(B) =16. 10. EXPERIMENT INFERENCE  In this work I have predicted an estimated number of total reserved compute capacity expected to process the IoT data from the site location emanating from 2 IoT sensors with a stochastic & unpredictable nature of data propagation to be in the range of 14-20 instances depending on the number multi-availability zone under consideration.  I have intended to perform the experiment on the Beach water data and interpreted the probability distribution of turbidity and water temperature of the beach water data. As observed the water temperature results could be interpreted as uniform probability distribution & the turbidity as Gaussian distribution for thousands of instances of captured for every event. The Inventory theory model for optimization of reserved Cloud Compute resources was hence applied and found that number of instances optimized by Inventory model application is streamlined to (8+8) =16 instances which falls correctly into the range of instances predicted & assumed during solution architectural approach. 11. LIMITATIONS & AREAS OF IMPROVEMENT While in the process of background study I analysed Kenneth J. Arrow et al [7] where optimal inventory policies have been discussed with constant & variable rates of changes, I took my understanding to model the solution of variable data inflow in an idealized rather than an exact representation of the real problem. Hence my results do not guarantee the solution to be the best possible solution, rather an approach with real time data to achieve an optimized prediction of reserved compute instances & would lead to cost optimization with upfront plan of reserved instances.
  • 14. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 32 Abayomi et al [8] discussed in their research paper on Dynamics of Inventory Cost Optimization the application of Inventory Theory model applied to both deterministic and stochastic demands & thereby arrived to find out the optimised stock procurement approach when applied in context of Supply chain. I have however, not performed any experiment to apply the Inventory Model to determine the number of reserved compute instances in any scenario where there is a constant deterministic in nature rather the concept of Inventory Model Cost optimization has been applied to stochastic data emanating from IoT & captured in AWS® cloud architecture to predict a baseline of the number of reserved compute instances &further optimized using the distribution output of the data sources. Hence, I acknowledge the application of the same solution approach into data sources of constant deterministic data is a subject of further research. 12. CONCLUSION I expect that my work to be a connecting between all the research work established to optimize the reserved capacity cost optimization in Cloud Computing. With real data obtained from IoT devices which have an uncertainty stochastic nature of propagation, I have intended to simulate a solution architecture to capture such data to predict a range of compute capacity required for flow, propagation & storage & for the same opted AWS® Cloud platform services. In my endeavour to further interpret the dataset into probability distribution curves I have applied Cost optimization from Inventory theory model to identify the optimized number of reserved instances required for the experiment performed which was observed to be falling in the range of instances predicted by solution approach. I, trust this work will motivate upcoming avenues of future research where data is stochastic & real time, data driven approach for cost estimation of compute power is necessary and budget for the same need to be predetermined with less dependence on the Cloud service provider to provide the provisioning plans. REFERENCES [1] Sidney Browne and Paul Zipkin (1991) “Inventory Models With Continuous, Stochastic Demands”, https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/6353/browne_inventory_models.pdf [2] Andrea Nodari (2015) “Cost Optimization in Cloud Computing”https://aaltodoc.aalto.fi/bitstream/handle/123456789/17711/master_Nodari_Andrea_1970 .pdf?sequence=1 [3] Aiping Wang and Hong Wang (2021) “Survey on stochastic distribution systems: A full probability density function control theory with potential applications” https://www.ornl.gov/publication/survey- stochastic-distribution-systems-full-probability-density-function-control-theory [4] D. Altman & J. Bland (1995) “ Statistics notes: The normal distribution” https://pubmed.ncbi.nlm.nih.gov/7866172/ [5] Peter Jez (2012) “Uniform Distribution with respect to Gaussian measure” http://www.cosy.sbg.ac.at/research/tr/2012-02_Jez.pdf [6] Luis A. San-José, Joaquín Sicilia, Manuel González-de-la-Rosa & Jaime Febles-Acosta (2021) “Profit maximization in an inventory system with time-varying demand, partial backordering and discrete inventory cycle” https://link.springer.com/article/10.1007/s10479-021-04161-6 [7] Kenneth J. Arrow, Theodore Harris and Jacob Marschak (1958) “Optimal Inventory Policy” https://www.or.mist.i.u-tokyo.ac.jp/takeda/FreshmanCourse/ArrowHarrisMarschak.pdf [8] T Abayomi, Onanuga, Adekunle Adeyemi (2014) “Dynamics of Inventory Cost Optimization-A Review of Theory and Evidence” https://www.iiste.org/Journals/index.php/RJFA/article/view/17593/0
  • 15. International Journal on Cloud Computing: Services and Architecture (IJCCSA) Vol. 11, No. 1/2/3/4/5/6, December 2021 33 AUTHOR Sayan Guha completed his Bachelors in Electronics & Communication Engineering in 2006. He has been serving Information Technology industry supporting Data Modelling & Architecture across business domains of Retail, Banking, and Insurance & Telecom. He is currently working with Cognizant technology Solutions and his area of interests are in the fields of Cloud Solution architecture, Big Data Integration, API based data integration & Advanced analytics.