SlideShare a Scribd company logo
Data Analysis
• The data, after collection, has to be processed and
analysed in accordance with the outline laid down for the
purpose at the time of developing the research plan
• This is essential for a scientific study and for ensuring that
we have all relevant data for making contemplated
comparisons and analysis
• Processing implies editing, coding, classification and
tabulation of collected data so that they are amenable to
analysis
• The term analysis refers to the computation of certain
measures along with searching for patterns of relationship
that exist among data-groups
Processing Operations
• Editing
• Coding
• Classification
• Tabulation
• Process of examining the collected raw data to detect
errors and omissions and to correct these when possible
• It involves a careful scrutiny of the completed
questionnaires and/or schedules
• It ensures that the data are accurate, consistent with other
facts gathered, uniformly entered, as completed as
possible and have been well arranged to facilitate coding
and tabulation
• With regard to points or stages at which editing should be
done, one can talk of field editing and central editing
• Field editing consists in the review of the reporting forms
by the investigator for completing (translating or rewriting)
what the latter has written in abbreviated and/or in illegible
form at the time of recording the respondents’ responses.
This type of editing is necessary in view of the fact that
individual writing styles often can be difficult for others to
decipher
Editing
• Central editing should take place when all forms or
schedules have been completed and returned to the office
• Thorough editing by a single editor or a team of editors in
case of a large inquiry
• Editor(s) may correct the obvious errors
• In case of inappropriate on missing replies, the editor can
sometimes determine the proper answer by reviewing the
other information in the schedule and at the same time
respondent can be contacted for clarification
Editors must keep in view several points while performing
their work:
• They should be familiar with instructions given to the
interviewers and coders as well as with the editing
instructions supplied
• While crossing out an original entry for one reason or
another, they should just draw a single line on it so that the
same may remain legible
• They must make entries (if any) on the form in some
distinctive colur and that too in a standardised form
• They should initial all answers which they change or
supply
• Editor’s initials and the date of editing should be placed on
each completed form or schedule
Coding
• It refers to the process of assigning numerals or other
symbols to answers so that responses can be put into a
limited number of categories or classes appropriate to the
research problem
• They must also possess the characteristic of
exhaustiveness and also that of mutual exclusively which
means
• Another rule to be observed is that of unidimensionality by
which is meant that every class is defined in terms of only
one concept
• Through it the several replies may be reduced to a small
number of classes which contain the critical information
required for analysis
Classification
• Most research studies result in a large volume of raw data
which must be reduced into homogeneous groups if we are
to get meaningful relationships
• This fact necessitates classification of data which happens
to be the process of arranging data in groups or classes on
the basis of common characteristics
• Data having a common characteristic are placed in one
class and in this way the entire data get divided into a
number of groups or classes
• Classification can be one of the following two types,
depending upon the nature of the phenomenon involved:
 According to attributes
 According to class intervals
Tabulation
• When a mass of data has been assembled, it becomes
necessary for the researcher to arrange the same in some
kind of concise and logical order
• This procedure is referred to as tabulation and thus,
tabulation is the process of summarizing raw data and
displaying the same in compact form (i.e., in the form of
statistical tables) for further analysis
• In a broader sense, tabulation is an orderly arrangement of
data in columns and rows
• Tabulation is essential because of the following reasons:
It conserves space and reduces explanatory and
descriptive statement to a minimum
It facilitates the process of comparison
It facilitates the summation of items and the detection of
errors and omissions
It provides a basis for various statistical computations
Generally Accepted Principles of Tabulation
• Every table should have a clear, concise and adequate title
so as to make the table intelligible without reference to the
text and this title should always be placed just above the
body of the table
• Every table should be given a distinct number to facilitate
easy reference
• The column headings (captions) and the row headings
(stubs) of the table should be clear and brief
• The units of measurement under each heading or sub-
heading must always be indicated
• Explanatory footnotes, if any, concerning the table should
be placed directly beneath the table, along with the
reference symbols used in the table
• Source or sources from where the data in the table have
been obtained must be indicated just below the table
• Usually the columns are separated from one another by
lines which make the table more readable and attractive
• Lines are always drawn at the top and bottom of the table
and below the captions
• There should be thick lines to separate the data under one
class from the data under another class and the lines
separating the sub-divisions of the classes should be
comparatively thin lines
• The columns may be numbered to facilitate reference
• Those columns whose data are to be compared should be
kept side by side
• Similarly, percentages and/or averages must also be kept
close to the data
• It is generally considered better to approximate figures
before tabulation as the same would reduce unnecessary
details in the table itself
• In order to emphasise the relative significance of certain
categories, different kinds of type, spacing and
indentations may be used
• It is important that all column figures be properly aligned
Decimal points and (+) or (–) signs should be in perfect
alignment
• Abbreviations should be avoided to the extent possible
and ditto marks should not be used in the table
• Miscellaneous and exceptional items, if any, should be
usually placed in the last row of the table
• Table should be made as logical, clear, accurate and simple
as possible. If the data happen to be very large, they
should not be crowded in a single table for that would
make the table unwieldy and inconvenient
• Total of rows should normally be placed in the extreme
right column and that of columns should be placed at the
bottom
• The arrangement of the categories in a table may be
chronological, geographical, alphabetical or according to
magnitude to facilitate comparison
Elements/ Types of Analysis
• By analysis we mean the computation of certain indices or
measures along with searching for patterns of relationship
that exist among the data groups
• It involves estimating the values of unknown parameters
and testing of hypotheses for drawing inferences
• Analysis may, therefore, be categorized as descriptive
analysis and inferential analysis (Inferential analysis is
often known as statistical analysis)
• Descriptive analysis is largely the study of distributions of
one variable & this sort of analysis may be in respect of
one variable (described as unidimensional analysis), or in
respect of two variables (described as bivariate analysis)
or in respect of more than two variables (described as
multivariate analysis)
• We may as well talk of correlation analysis and causal
analysis
• Correlation analysis studies the joint variation of two or
more variables for determining the amount of correlation
between two or more variables
• Causal analysis is concerned with the study of how one or
more variables affect changes in another variable
• It is thus a study of functional relationships existing
between two or more variables
• This analysis can be termed as regression analysis
• Causal analysis is considered relatively more important in
experimental researches
• In modern times, with the availability of computer facilities,
there has been a rapid development of multivariate
analysis which may be defined as “all statistical methods
which simultaneously analyse more than two variables
Elements/ Types of Analysis
Multivariate analysis
• Multiple regression analysis: This analysis is adopted
when the researcher has one dependent variable which is
presumed to be a function of two or more independent
variables
• The objective of this analysis is to make a prediction about
the dependent variable based on its covariance with all the
concerned independent variables
• Multiple discriminant analysis: This analysis is appropriate
when the researcher has a single dependent variable that
cannot be measured, but can be classified into two or more
groups on the basis of some attribute
• The object of this analysis is to o predict an entity’s
possibility of belonging to a particular group based on
several predictor variables
• Multivariate analysis of variance (or multi-ANOVA):
Extension of two way ANOVA, wherein the ratio of among
group variance to within group variance is worked out on a
set of variables
• Canonical analysis: This analysis can be used in case of
both measurable and non-measurable variables for the
purpose of simultaneously predicting a set of dependent
variables from their joint covariance with a set of
independent variables
• Inferential analysis is concerned with the various tests of
significance for testing hypotheses in order to determine
with what validity data can be said to indicate some
conclusion or conclusions
Statistics in Research
• The role of statistics in research is to function as a tool in
designing research, analysing its data and drawing
conclusions therefrom
• Clearly the science of statistics cannot be ignored by any
research worker, even though he may not have occasion to
use statistical methods in all their details and ramifications
• The important statistical measures
 Measures of central tendency or statistical averages
 Measures of dispersion
 Measures of asymmetry (skewness)
 Measures of relationship
 Other measures
• Measures of central tendency (or statistical averages) tell
us the point about which items have a tendency to cluster
• Mean, median and mode are the most popular averages
Median
• Arrange your numbers in numerical order
• Count how many numbers you have
• If you have an odd number, divide by 2 and round up to get
the position of the median number
• If you have an even number, divide by 2. Go to the number
in that position and average it with the number in the next
higher position to get the median
Mode
To find the mode, or modal value, it is best to put the
numbers in order. Then count how many of each number. A
number that appears most often is the mode.
Find the mean, median, and mode for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
Mean=15
Median: 14
Mode:13
1, 2, 4, 7
Mean=3.5
Median= (2+4)/2=3
Mode=0
G.M. & H.M.
Measure of Dispersion
• An average can represent a series only as best as a single
figure
• It fails to give any idea about the scatter of the values in
the series around the true value of average
• In order to measure this scatter, statistical devices called
measures of dispersion are calculated Important measures
of dispersion are
Range
Mean deviation
Standard deviation
https://geographyfieldwork.com/DataPresentationScatterGraphs.htm#
Range
• It is the simplest possible measure of dispersion and is
defined as the difference between the values of the
extreme items of a series
• Range = Highest value of an item in a series- Lowest value
of an item in a series
• It gives an idea of the variability very quickly, but the
drawback is that range is affected very greatly by
fluctuations of sampling
• Its value is never stable, being based on only two values of
the variable
• As such, range is mostly used as a rough measure of
variability and is not considered as an appropriate measure
in serious research studies
Mean deviation
• It is the average of difference of the values of items from
some average of the series
• In calculating mean deviation we ignore the minus sign of
deviations while taking their total for obtaining the mean
deviation
Standard deviation
• It is most widely used measure of dispersion of a series
and is commonly denoted by the symbol sigma
• Standard deviation is defined as the square-root of the
average of squares of deviations, when such deviations for
the values of individual items in a series are obtained from
the arithmetic average
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Measures of Asymmetry
When the distribution of item in a series happens to be
perfectly symmetrical, we then have the following type of
curve for the distribution:
• A normal curve and the relating distribution as normal
distribution
• Such a curve is perfectly bell shaped curve in which case
the value of X or M or Z is just the same and skewness is
altogether absent
• If the curve is distorted (whether on the right side or on the
left side), we have asymmetrical distribution which
indicates that there is skewness
• If the curve is distorted on the right side, we have positive
skewness but when the curve is distorted towards left, we
have negative skewness
Skewness is, thus, a measure of asymmetry and shows the
manner in which the items are clustered around the average
Measures of Relationship
• Statistical measures that we used so far are in context of
univariate population i.e., measurement of only one
variable
• If for every measurement of a variable, X, there is a
corresponding value of a second variable, Y, the resulting
pairs of values are called a bivariate population
• Similarly it can be a multi-variable data
• There are several methods of determining the relationship
between variables, but no method can tell us for certain
that a correlation is indicative of causal relationship
Two types of questions in bivariate or multivariate
populations
• Does there exist association or correlation between the two
(or more) variables? If yes, of what degree?
• Is there any cause and effect relationship between the two
variables ? If yes, of what degree and in which direction?
 The first question is answered by the use of correlation
technique and the second question by the technique of
regression
Measures of Relationship
There are several methods of applying the two techniques,
but the important ones are as under:
 In case of bivariate population: Correlation can be studied
through
• Cross tabulation
• Charles Spearman’s coefficient of correlation
• Karl Pearson’s coefficient of correlation; whereas cause
and effect relationship can be studied through simple
regression equations
 In case of multivariate population: Correlation can be
studied through
• Coefficient of multiple correlation
• Coefficient of partial correlation; whereas cause and effect
relationship can be studied through multiple regression
Measures of Relationship
Simple Regression Analysis
• Regression is the determination of a statistical relationship
between two or more variables
• In simple regression, we have only two variables, one
variable (defined as independent) is the cause of the
behaviour of another one (defined as dependent variable)
• Regression can only interpret what exists physically i.e.,
there must be a physical way in which independent
variable X can affect dependent variable Y
The basic relationship between X and Y is given by
denotes the estimated value of Y for a given value of X
Then generally used method to find the ‘best’ fit that a
straight line of this kind can give is the least-square method
Least-Square Method
Least Square Curve Fitting method
b
a b
S. S. Shashtri , “Introductory-Methods-of-Numerical-Analysis, 2012, PHI Learning, N. Delhi
Read a0=a, a1=b*
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
Data Analysis technique, data collection, data analysis
A sigmoid function is a mathematical function having a characteristic "S"-shaped curve
or sigmoid curve. A common example of a sigmoid function is the logistic function shown in
the first figure and defined by the formula
Definition
• Curve fitting: is the process of constructing a
curve, or mathematical function, that has the
best fit to a series of data points, possibly
subject to constraints.
• It is a statistical technique use to drive
coefficient values for equations that express
the value of one(dependent) variable as a
function of another (independent variable)
https://www2.slideshare.net/shopnohinami/curve-fitting-53775511?from_action=save
What is curve fitting
Curve fitting is the process of constructing a curve, or
mathematical functions, which possess closest proximity to
the series of data. By the curve fitting we can mathematically
construct the functional relationship between the observed
fact and parameter values, etc. It is highly effective in
mathematical modelling some natural processes.
https://www2.slideshare.net/shopnohinami/curve-fitting-53775511?from_action=save
Interpolation & Curve fitting
• In many application areas, one is faced with the
test of describing data, often measured, with an
analytic function. There are two approaches to
this problem:-
• 1. In Interpolation, the data is assumed to be
correct and what is desired is some way to
descibe what happens between the data
points.
• 2. The other approach is called curve fitting
or regression, one looks for some smooth
curve that
``best fits'' the data, but does not necessarily
pass through any data points.
In many application areas, one is faced with the test of
describing data, often measured, with an analytic function.
There are two approaches to this problem
• In Interpolation, the data is assumed to be correct and what
is desired is some way to describe what happens between
the data points
•The other approach is called curve fitting or regression, one
looks for some smooth curve that ``best fits'' the data, but
does not necessarily pass through any data points
Curve fitting
There are two general approaches for curve fitting:
• Least squares regression
Data exhibit a significant degree of scatter. The strategy is to
derive a single curve that represents the general trend of the
data
• Interpolation
Data is very precise. The strategy is to pass a curve or a
series of curve through each of the points is very precise.
General approach for curve fitting
Engineering Aapplications of Curve fitting
Technique
• Trend Analysis:- Predicating values of dependent variable ,
may include extrapolation beyond data points or
interpolation between data points
In engineering, two types of applications are encountered:
• Trend analysis. Predicting values of dependent variable, may
include extrapolation beyond data points or interpolation
between data points
• Hypothesis testing. Comparing existing mathematical model
with measured data
Data scatterness
Positive
Correlation
Positive Correlation
No Correlation
Mathematical Background
• Variance. Representation of spread by the
square of the standard deviation.
• Coefficient of variation. Has the utility to
quantify the spread of data.
2
n 1
(y  y)
S 2

 i
y
2
2
2  
n1
 y / n
y
S  i
i
y
c.v. 
Sy
100%
y
• Mean
• S.D
Least square method
Linear Regression: Criteria for a “Best” Fit
n n
 a0  a1 xi)
min ei  (yi
i1 i1
e1= -e2
Linear Regression: Criteria for a “Best” Fit
n n
min |ei |  | yi  a0  a1 xi |
i1 i1
Linear Regression: Criteria for a “Best” Fit
n
min max| ei || yi  a0  a1xi |
i1
Linear curve fitting (Straight line)?
Given a set of data point (xi, f(xi )) find a curve that best
captures the general trend
• Where g(x) is approximation function set of data point
(xi, f(xi )) find a curve that best captures the general
trend
• Where g(x) is approximation function
Try to fit a straight
line Through the
data
Linear Regression: Least Squares Fit
n
i
n n
r  i  i
S  
i1
2
2
i1 i1
2
( yi  a0 a1 xi )
e  (y ,measured  y ,model) 



n
n
i
r e
i1
i 0 1 i
i1
2
(y  a  a x)2
min S 
Yields a unique line for a given set of
data.
Linear Regression: Least Squares Fit
 
n
n
r i i 0 1 i
2
e  (y  a  a x)2
min S 
i1 i1
The coefficients a0 and a1 that minimize Sr must
satisfy the following conditions:
 0


a1

S


a0
r
 0
Sr
Linear Regression: Determination of ao and a1
2
0
0  1 i
 i i  i 
o
y x  a x  a x
1
0   yi  a0
 a1xi
 2(yi  ao  a1xi )xi  0
Sr
a
 2(yi  ao  a1 xi )  0
Sr
a
   2
0 1 i
i i i
y x  a x  a x
a0  na0
na0  xi a1   yi
2 equations with
2 unknowns,
can be solved
simultaneously
Linear Regression:
Determination of ao and a1
2
2
1
 i  i
x   x 
n
 i i  xi  yi
x y
n
a 
a0  y a1 x
Data Analysis technique, data collection, data analysis
Error Quantification of Linear Regression
• Sum of the squares of residuals around
the regression line is Sr
• Total sum of the squares around the mean
for the dependent variable, y, is St
2
St  ( yi  y)
2
n
n
2
r  i 
i1 i1
e  (yi ao a1xi )
S 
Example
• The table blew gives the temperatures T in C
and Resistance R in Ω of a circuit if R=a0 +
a1T
• Find the values of a0 and a1
T 10 20 30 40 50 60
R 20.1 20.2 20.4 20.6 20.8 21
Solution
T=Xi R=yi 𝑿𝒊𝟐 =𝑻𝟐 Xiyi=TR g(xi)=Y
10 20.1 100 201 20.05
20 20.2 400 404 20.24
30 20.4 900 612 20.42
40 20.6 1600 824 20.61
50 20.8 2500 1040 20.80
60 21 3600 1260 20.98
𝑥𝑖=210 𝑦𝑖=123.1 𝑥𝑖2= 9100 𝑥𝑖𝑦𝑖=4341
Solution
a0=19.867
a1
=0.01857
• 6a0+210a1=123.1
• 210a0+9100a1=4341
• g(x)=19.867+0.01857
*T
Least Squares Fit of a Straight Line:
Example
• Fit a straight line to the x and y values in
the
following
Table: xi  28  yi  24.0
2
 i
x 140 i i
x y 119.5
x 
28
 4
7
7
y 
24
 3.428571
xi yi xy
i i i
x 2
1 0.5 0.5 1
2 2.5 5 4
3 2 6 9
4 4 16 16
5 3.5 17.5 25
6 6 36 36
7 5.5 38.5 49
28 24 119.5 140
Least Squares Fit of a Straight Line:
Example
2
2
1
 x )
x (
n
x y
n
a 
i i
 i i   xi yi

7119.5 28 24
 0.8392857
7140 282
a0  y  a1x
 3.428571 0.8392857 4  0.07142857
Y = 0.07142857 + 0.8392857 x
Least Squares Fit of a Straight Line: Example
(Error Analysis)
2
 i
r e  2.9911
S 
 0.868
St
 Sr
r 2

St
2
y  y  22.7143
S 
t  i
r2
r   0.868 0.932
Least Squares Fit of a Straight Line:
Example
(Error Analysis)
• The standard deviation (quantifies the
spread around the mean):

n 1 71
s 
St 22.7143
 1.9457
y
•The standard error of estimate (quantifies the
spread around the regression line)
7 2

2.9911
 0.7735
n 2
s 
Sr
y /x
• The relationship between the dependent
and independent variables is linear.
• However, a few types of nonlinear functions
can be transformed into linear regression
problems.
 The exponential equation.
 The power equation.
 The saturation-growth-rate equation.
Linearization of Nonlinear Relationships
Data Analysis technique, data collection, data analysis
Linearization of Nonlinear Relationships
1. The exponential equation.
ln y  ln a1  b1x
y* = ao + a1 x
Linearization of Nonlinear Relationships
2. The power equation
log y  log a2  b2 logx
y* = ao + a1 x*
Linearization of Nonlinear Relationships
The saturation-growth-rate equation
a

x

y a 3  
1

1

b3  1
3
y* = 1/y
ao = 1/a3
a1 =
b3/a3 x*
= 1/x
Example
Fit the following Equation:
y  a2 xb2
To the data in the following table:
xi yi
X*=log xi Y*=logyi
1 0.5 0 0.602
2 1.7 0.301 0.753
3 3.4 0.301 0.699
4 5.7 .226 0.922
5 8.7 .447 2.079
15 19.7 .534 2.141
log y  log(a2 x 2
)
b
let Y*
 log y, X*
 log x,
a0  log a2 , a1 b2
2 2
log y  log a  b log x
Y *
 a  a X *
0 1
Example
Su
m
Xi Yi X*i=Log(X) Y*i=Log(Y) X*Y* X*^2
1 0.5 0.0000 -0.3010 0.0000 0.0000
2 1.7 0.3010 0.2304 0.0694 0.0906
3 3.4 0.4771 0.5315 0.2536 0.2276
4 5.7 0.6021 0.7559 0.4551 0.3625
5 8.4 0.6990 0.9243 0.6460 0.4886
15 19.700 2.079 2.141 1.424 1.169
i i
51.4242.0792.141
 1.75
51.1692.0792
n x 2
( x )
2
a1


a0 y  a1x  0.4282 1.75 0.41584 0.334
 nx i yi xi yi
 
Linearization of Nonlinear Functions: Example
log y=-0.334+1.75log
x
y  0.46x1.75
Polynomial Regression
• Some engineering data is poorly represented by a straight line
• For these cases a curve is better suited to fit the data
• The least squares method can readily be extended to fit
the data to higher order polynomials
Polynomial Regression
(cont’d)
A parabola is preferable
Polynomial Regression
(cont’d)
• A
2nd
2nd order polynomial (quadratic) is
defined by:
y  a  a x  a x2
e
o 1 2
• The residuals between the model and the data:
e  y  a  a x  a x2
i i o 1 i 2 i
• The sum of squares of the residual:
2
2
2
2 i
r  i   a x 
e  yi  ao  a1xi
S 
Polynomial Regression
(cont’d)
• A system of 3x3 equations needs to be solved to
determine the coefficients of the polynomial.
• The standard error & the coefficient of
determination
n 3
s 
Sr
y /x
t
S
 Sr
r 2

St


   


  


 i i
i
 i  i  i 
i  i  i 
i

i
a x y
x
x
x
x a   x y 
x
x
2
 2  i i 
1
0

4
3
2
3
2
n x x2
a    y 
Polynomial Regression
(cont’d)
• The coefficient of
determination:
General:
The mth-order polynomial:
y  a  a x  a x2
 .....  a xm
e
o 1 2 m
• A system of (m+1)x(m+1) linear equations must be
solved for
determining the coefficients of the mth-order polynomial.
• The standard error:
s 
Sr
n m1
y/ x
St
 Sr
r 2

St
Polynomial Regression-
Example
Fit a second order polynomial to
data:
3
x  225
4
 979
 i
x
xi yi
xi2
xi3
xi4 xiyi
xi yi
2
0 2.1 0 0 0 0 0
1 7.7 1 1 1 7.7 7.7
2 13.6 4 8 16 27.2 54.4
3 27.2 9 27 81 81.6 244.8
4 40.9 16 64 256 163.6 654.4
5 61.1 25 125 625 305.5 1527.5
15 152.6 55 225 979 585.6 2489
xi yi  585.6
xi 15
yi 152.6
2
 i
 i
x  55
y 
152.6
 25.433
6
x 
15
 2.5,
6
2
 i i
x y  2488.8
2nd order polynomial
Exampley  a  a x  a x2
o 1 2
xi fi 𝑥𝑖2 𝒙𝒊𝟑 𝒙𝒊𝟒 fixi 𝒇𝒊𝒙𝒊𝟐 g (x)
1 4 1 1 1 4 4 4.505
2 11 4 8 6 22 44 10.15
4 19 16 64 256 76 304 19.43
6 26 36 216 1296 156 936 26.03
8 30 64 512 4096 240 1920 29.95
𝑥=21
𝑓𝑖=
90
𝑥𝑖2=
121
𝑥𝑖3
=801
𝑥𝑖4
=5665
𝑓𝑖𝑥𝑖=
498
𝑓𝑖𝑥𝑖2
=3208
2nd order polynomial Example
5a0 +21a1+121a2=90
21a0+121a1+801a2=498
121a0+801a1+5665a2=3
208
a0=-1.81 ,a1=6.65 ,a2=-0.335
So the required equation
is g (x)=-1.81+6.65X-
0.335𝑥2
Exponential function
x 1 2 3 4 5
y 1.5 4.5 6 8.5 11
Solution
y=a𝑒𝑏𝑥
lny=lna𝑒𝑏𝑥
=lna+bx
Y=a0+a1X
Where Y=lny=fi, a0=a ,a1=b ,
X=x
X= xi yi Y=lny 𝒙𝒊𝟐 xiyi g (x)
1 1.5 0.405 1 0.405 2.06
2 4.5 1.504 4 3.008 3.27
3 6 1.791 9 5.373 5.186
4 8.5 2.14 16 8.56 8.22
5 11 2.39 25 11.95 13.03
𝑥𝑖=15
𝑓𝑖=
8.23
𝑥𝑖2= 55
𝑓𝑖𝑥𝑖=29.296
Solution
Solution
• 5a0 +15a1 =8.23
• 15a0 + 55a1 =
29.296
; a0=
0.2642
;a1=0.460
6
• a= 𝑒0.2642=1.30234, b=0.4606
• Require equation g (x)=1.30238𝑒
0.4606
Exampl
• Power
function:
x 2 2.5 3 3.5 4
y 7 8.5 11 12.75 15
Solution:
• y=a𝑥𝑏
• lny = lna + blnx
• Y=a0 +a1X
• Where, Y=lny, a0=lna; X=lnx;
a1=b
Solution
x y lnx=X lny=Y 𝑿𝟐 XY g (x)
2 7 0.6931 1.946 0.480 1.3487 6.868
2.5 8.5 0.9163 2.140 0.8396 1.9608 8.813
3 11 1.098 2.397 1.2056 2.6319 10.806
3.5 12.75 1.252 2.545 1.5675 3.1863 12.838
4 15 1.386 2.708 1.9209 3.7532 14.904
𝑋𝑖=
5.3454
𝑓𝑖=
11.736
𝑋𝑖2 =
6.0136
𝑓𝑖𝑋𝑖=
12.8809
Solution
• 5a0+5.3454a1=11.736
• 5.3454a0+6.0136a1=12.8809
• a0=1.1521 ; a1=1.1178
• a= 𝑒𝑎0
• =𝑒1.1521=3.1648
b=a1=1.1178
Required equation=3.1648𝑥1.1178
Polynomial Regression- Example
(cont’d)
xi yi ymodel ei
2 (yi-y`)2
0 2.1 2.4786 0.14332 544.42889
1 7.7 6.6986 1.00286 314.45929
2 13.6 14.64 1.08158 140.01989
3 27.2 26.303 0.80491 3.12229
4 40.9 41.687 0.61951 239.22809
5 61.1 60.793 0.09439 1272.13489
15 152.6 3.74657 2513.39333
•The standard error of estimate:
3.74657
1.12
6 3
y /x
s 
2513.39
r2
r   0.99925
•The coefficient of determination:
r2

2513.393.74657
 0.99851,

More Related Content

PPTX
Editing, coding and tabulation of data
PPTX
7.pptx
PPTX
Data analysis copy
PDF
7 Processing And Analysis Of Data
PPTX
Research methodology-Research Report
PPTX
Research Methodology-Data Processing
PPTX
Data processing and presentation
PPTX
1. Data Process.pptx
Editing, coding and tabulation of data
7.pptx
Data analysis copy
7 Processing And Analysis Of Data
Research methodology-Research Report
Research Methodology-Data Processing
Data processing and presentation
1. Data Process.pptx

Similar to Data Analysis technique, data collection, data analysis (20)

PDF
RM CHAPTER SEVEN AND EIGHT.pKJUYTTRRRRRRRRdf
PDF
Data processing in research methodology
PPTX
Research methodology
PPTX
Research methodology
PPTX
research methods CHAPTER VI data processing, analysis and interpretation .pptx
PPTX
Lecture 1- data preparation.pptx
PPTX
MOdule IV- Data Processing.pptx
PPTX
Data analysis.pptx
PPTX
Coding, editing, Tabulation and validation.pptx
PDF
Brm unit iv - cheet sheet
PPTX
8. data analysis in research practice.pptx
PPT
Data analysis & interpretation
PPT
a data editing, coding and tabulation.ppt
PPTX
dataanalysisandinterpretation-231025045220-81d52e02.pptx
PPTX
Analysis of data.pptx
PPTX
Collecting, analyzing and interpreting data
PPTX
Ansalysis of daata w- roough slides.pptx
PPTX
Data processing
PPTX
Introduction to Data Analysis for Nurse Researchers
PDF
Research Methodology Module-04
RM CHAPTER SEVEN AND EIGHT.pKJUYTTRRRRRRRRdf
Data processing in research methodology
Research methodology
Research methodology
research methods CHAPTER VI data processing, analysis and interpretation .pptx
Lecture 1- data preparation.pptx
MOdule IV- Data Processing.pptx
Data analysis.pptx
Coding, editing, Tabulation and validation.pptx
Brm unit iv - cheet sheet
8. data analysis in research practice.pptx
Data analysis & interpretation
a data editing, coding and tabulation.ppt
dataanalysisandinterpretation-231025045220-81d52e02.pptx
Analysis of data.pptx
Collecting, analyzing and interpreting data
Ansalysis of daata w- roough slides.pptx
Data processing
Introduction to Data Analysis for Nurse Researchers
Research Methodology Module-04
Ad

Recently uploaded (20)

PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Current and future trends in Computer Vision.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
web development for engineering and engineering
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PPT on Performance Review to get promotions
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
composite construction of structures.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Current and future trends in Computer Vision.pptx
CH1 Production IntroductoryConcepts.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering
R24 SURVEYING LAB MANUAL for civil enggi
Operating System & Kernel Study Guide-1 - converted.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Foundation to blockchain - A guide to Blockchain Tech
PPT on Performance Review to get promotions
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Lecture Notes Electrical Wiring System Components
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
composite construction of structures.pdf
OOP with Java - Java Introduction (Basics)
Model Code of Practice - Construction Work - 21102022 .pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Ad

Data Analysis technique, data collection, data analysis

  • 1. Data Analysis • The data, after collection, has to be processed and analysed in accordance with the outline laid down for the purpose at the time of developing the research plan • This is essential for a scientific study and for ensuring that we have all relevant data for making contemplated comparisons and analysis • Processing implies editing, coding, classification and tabulation of collected data so that they are amenable to analysis • The term analysis refers to the computation of certain measures along with searching for patterns of relationship that exist among data-groups
  • 2. Processing Operations • Editing • Coding • Classification • Tabulation
  • 3. • Process of examining the collected raw data to detect errors and omissions and to correct these when possible • It involves a careful scrutiny of the completed questionnaires and/or schedules • It ensures that the data are accurate, consistent with other facts gathered, uniformly entered, as completed as possible and have been well arranged to facilitate coding and tabulation • With regard to points or stages at which editing should be done, one can talk of field editing and central editing • Field editing consists in the review of the reporting forms by the investigator for completing (translating or rewriting) what the latter has written in abbreviated and/or in illegible form at the time of recording the respondents’ responses. This type of editing is necessary in view of the fact that individual writing styles often can be difficult for others to decipher Editing
  • 4. • Central editing should take place when all forms or schedules have been completed and returned to the office • Thorough editing by a single editor or a team of editors in case of a large inquiry • Editor(s) may correct the obvious errors • In case of inappropriate on missing replies, the editor can sometimes determine the proper answer by reviewing the other information in the schedule and at the same time respondent can be contacted for clarification
  • 5. Editors must keep in view several points while performing their work: • They should be familiar with instructions given to the interviewers and coders as well as with the editing instructions supplied • While crossing out an original entry for one reason or another, they should just draw a single line on it so that the same may remain legible • They must make entries (if any) on the form in some distinctive colur and that too in a standardised form • They should initial all answers which they change or supply • Editor’s initials and the date of editing should be placed on each completed form or schedule
  • 6. Coding • It refers to the process of assigning numerals or other symbols to answers so that responses can be put into a limited number of categories or classes appropriate to the research problem • They must also possess the characteristic of exhaustiveness and also that of mutual exclusively which means • Another rule to be observed is that of unidimensionality by which is meant that every class is defined in terms of only one concept • Through it the several replies may be reduced to a small number of classes which contain the critical information required for analysis
  • 7. Classification • Most research studies result in a large volume of raw data which must be reduced into homogeneous groups if we are to get meaningful relationships • This fact necessitates classification of data which happens to be the process of arranging data in groups or classes on the basis of common characteristics • Data having a common characteristic are placed in one class and in this way the entire data get divided into a number of groups or classes • Classification can be one of the following two types, depending upon the nature of the phenomenon involved:  According to attributes  According to class intervals
  • 8. Tabulation • When a mass of data has been assembled, it becomes necessary for the researcher to arrange the same in some kind of concise and logical order • This procedure is referred to as tabulation and thus, tabulation is the process of summarizing raw data and displaying the same in compact form (i.e., in the form of statistical tables) for further analysis • In a broader sense, tabulation is an orderly arrangement of data in columns and rows • Tabulation is essential because of the following reasons: It conserves space and reduces explanatory and descriptive statement to a minimum It facilitates the process of comparison It facilitates the summation of items and the detection of errors and omissions It provides a basis for various statistical computations
  • 9. Generally Accepted Principles of Tabulation • Every table should have a clear, concise and adequate title so as to make the table intelligible without reference to the text and this title should always be placed just above the body of the table • Every table should be given a distinct number to facilitate easy reference • The column headings (captions) and the row headings (stubs) of the table should be clear and brief • The units of measurement under each heading or sub- heading must always be indicated • Explanatory footnotes, if any, concerning the table should be placed directly beneath the table, along with the reference symbols used in the table • Source or sources from where the data in the table have been obtained must be indicated just below the table • Usually the columns are separated from one another by lines which make the table more readable and attractive
  • 10. • Lines are always drawn at the top and bottom of the table and below the captions • There should be thick lines to separate the data under one class from the data under another class and the lines separating the sub-divisions of the classes should be comparatively thin lines • The columns may be numbered to facilitate reference • Those columns whose data are to be compared should be kept side by side • Similarly, percentages and/or averages must also be kept close to the data • It is generally considered better to approximate figures before tabulation as the same would reduce unnecessary details in the table itself • In order to emphasise the relative significance of certain categories, different kinds of type, spacing and indentations may be used
  • 11. • It is important that all column figures be properly aligned Decimal points and (+) or (–) signs should be in perfect alignment • Abbreviations should be avoided to the extent possible and ditto marks should not be used in the table • Miscellaneous and exceptional items, if any, should be usually placed in the last row of the table • Table should be made as logical, clear, accurate and simple as possible. If the data happen to be very large, they should not be crowded in a single table for that would make the table unwieldy and inconvenient • Total of rows should normally be placed in the extreme right column and that of columns should be placed at the bottom • The arrangement of the categories in a table may be chronological, geographical, alphabetical or according to magnitude to facilitate comparison
  • 12. Elements/ Types of Analysis • By analysis we mean the computation of certain indices or measures along with searching for patterns of relationship that exist among the data groups • It involves estimating the values of unknown parameters and testing of hypotheses for drawing inferences • Analysis may, therefore, be categorized as descriptive analysis and inferential analysis (Inferential analysis is often known as statistical analysis) • Descriptive analysis is largely the study of distributions of one variable & this sort of analysis may be in respect of one variable (described as unidimensional analysis), or in respect of two variables (described as bivariate analysis) or in respect of more than two variables (described as multivariate analysis) • We may as well talk of correlation analysis and causal analysis
  • 13. • Correlation analysis studies the joint variation of two or more variables for determining the amount of correlation between two or more variables • Causal analysis is concerned with the study of how one or more variables affect changes in another variable • It is thus a study of functional relationships existing between two or more variables • This analysis can be termed as regression analysis • Causal analysis is considered relatively more important in experimental researches • In modern times, with the availability of computer facilities, there has been a rapid development of multivariate analysis which may be defined as “all statistical methods which simultaneously analyse more than two variables Elements/ Types of Analysis
  • 14. Multivariate analysis • Multiple regression analysis: This analysis is adopted when the researcher has one dependent variable which is presumed to be a function of two or more independent variables • The objective of this analysis is to make a prediction about the dependent variable based on its covariance with all the concerned independent variables • Multiple discriminant analysis: This analysis is appropriate when the researcher has a single dependent variable that cannot be measured, but can be classified into two or more groups on the basis of some attribute • The object of this analysis is to o predict an entity’s possibility of belonging to a particular group based on several predictor variables • Multivariate analysis of variance (or multi-ANOVA): Extension of two way ANOVA, wherein the ratio of among group variance to within group variance is worked out on a set of variables
  • 15. • Canonical analysis: This analysis can be used in case of both measurable and non-measurable variables for the purpose of simultaneously predicting a set of dependent variables from their joint covariance with a set of independent variables • Inferential analysis is concerned with the various tests of significance for testing hypotheses in order to determine with what validity data can be said to indicate some conclusion or conclusions
  • 16. Statistics in Research • The role of statistics in research is to function as a tool in designing research, analysing its data and drawing conclusions therefrom • Clearly the science of statistics cannot be ignored by any research worker, even though he may not have occasion to use statistical methods in all their details and ramifications • The important statistical measures  Measures of central tendency or statistical averages  Measures of dispersion  Measures of asymmetry (skewness)  Measures of relationship  Other measures
  • 17. • Measures of central tendency (or statistical averages) tell us the point about which items have a tendency to cluster • Mean, median and mode are the most popular averages
  • 18. Median • Arrange your numbers in numerical order • Count how many numbers you have • If you have an odd number, divide by 2 and round up to get the position of the median number • If you have an even number, divide by 2. Go to the number in that position and average it with the number in the next higher position to get the median Mode To find the mode, or modal value, it is best to put the numbers in order. Then count how many of each number. A number that appears most often is the mode.
  • 19. Find the mean, median, and mode for the following list of values: 13, 18, 13, 14, 13, 16, 14, 21, 13 Mean=15 Median: 14 Mode:13 1, 2, 4, 7 Mean=3.5 Median= (2+4)/2=3 Mode=0 G.M. & H.M.
  • 20. Measure of Dispersion • An average can represent a series only as best as a single figure • It fails to give any idea about the scatter of the values in the series around the true value of average • In order to measure this scatter, statistical devices called measures of dispersion are calculated Important measures of dispersion are Range Mean deviation Standard deviation https://geographyfieldwork.com/DataPresentationScatterGraphs.htm#
  • 21. Range • It is the simplest possible measure of dispersion and is defined as the difference between the values of the extreme items of a series • Range = Highest value of an item in a series- Lowest value of an item in a series • It gives an idea of the variability very quickly, but the drawback is that range is affected very greatly by fluctuations of sampling • Its value is never stable, being based on only two values of the variable • As such, range is mostly used as a rough measure of variability and is not considered as an appropriate measure in serious research studies
  • 22. Mean deviation • It is the average of difference of the values of items from some average of the series • In calculating mean deviation we ignore the minus sign of deviations while taking their total for obtaining the mean deviation Standard deviation • It is most widely used measure of dispersion of a series and is commonly denoted by the symbol sigma • Standard deviation is defined as the square-root of the average of squares of deviations, when such deviations for the values of individual items in a series are obtained from the arithmetic average
  • 25. Measures of Asymmetry When the distribution of item in a series happens to be perfectly symmetrical, we then have the following type of curve for the distribution:
  • 26. • A normal curve and the relating distribution as normal distribution • Such a curve is perfectly bell shaped curve in which case the value of X or M or Z is just the same and skewness is altogether absent • If the curve is distorted (whether on the right side or on the left side), we have asymmetrical distribution which indicates that there is skewness • If the curve is distorted on the right side, we have positive skewness but when the curve is distorted towards left, we have negative skewness
  • 27. Skewness is, thus, a measure of asymmetry and shows the manner in which the items are clustered around the average
  • 28. Measures of Relationship • Statistical measures that we used so far are in context of univariate population i.e., measurement of only one variable • If for every measurement of a variable, X, there is a corresponding value of a second variable, Y, the resulting pairs of values are called a bivariate population • Similarly it can be a multi-variable data • There are several methods of determining the relationship between variables, but no method can tell us for certain that a correlation is indicative of causal relationship
  • 29. Two types of questions in bivariate or multivariate populations • Does there exist association or correlation between the two (or more) variables? If yes, of what degree? • Is there any cause and effect relationship between the two variables ? If yes, of what degree and in which direction?  The first question is answered by the use of correlation technique and the second question by the technique of regression Measures of Relationship
  • 30. There are several methods of applying the two techniques, but the important ones are as under:  In case of bivariate population: Correlation can be studied through • Cross tabulation • Charles Spearman’s coefficient of correlation • Karl Pearson’s coefficient of correlation; whereas cause and effect relationship can be studied through simple regression equations  In case of multivariate population: Correlation can be studied through • Coefficient of multiple correlation • Coefficient of partial correlation; whereas cause and effect relationship can be studied through multiple regression Measures of Relationship
  • 31. Simple Regression Analysis • Regression is the determination of a statistical relationship between two or more variables • In simple regression, we have only two variables, one variable (defined as independent) is the cause of the behaviour of another one (defined as dependent variable) • Regression can only interpret what exists physically i.e., there must be a physical way in which independent variable X can affect dependent variable Y The basic relationship between X and Y is given by denotes the estimated value of Y for a given value of X
  • 32. Then generally used method to find the ‘best’ fit that a straight line of this kind can give is the least-square method Least-Square Method
  • 33. Least Square Curve Fitting method b a b S. S. Shashtri , “Introductory-Methods-of-Numerical-Analysis, 2012, PHI Learning, N. Delhi
  • 46. A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A common example of a sigmoid function is the logistic function shown in the first figure and defined by the formula
  • 47. Definition • Curve fitting: is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. • It is a statistical technique use to drive coefficient values for equations that express the value of one(dependent) variable as a function of another (independent variable) https://www2.slideshare.net/shopnohinami/curve-fitting-53775511?from_action=save
  • 48. What is curve fitting Curve fitting is the process of constructing a curve, or mathematical functions, which possess closest proximity to the series of data. By the curve fitting we can mathematically construct the functional relationship between the observed fact and parameter values, etc. It is highly effective in mathematical modelling some natural processes. https://www2.slideshare.net/shopnohinami/curve-fitting-53775511?from_action=save
  • 49. Interpolation & Curve fitting • In many application areas, one is faced with the test of describing data, often measured, with an analytic function. There are two approaches to this problem:- • 1. In Interpolation, the data is assumed to be correct and what is desired is some way to descibe what happens between the data points. • 2. The other approach is called curve fitting or regression, one looks for some smooth curve that ``best fits'' the data, but does not necessarily pass through any data points. In many application areas, one is faced with the test of describing data, often measured, with an analytic function. There are two approaches to this problem • In Interpolation, the data is assumed to be correct and what is desired is some way to describe what happens between the data points •The other approach is called curve fitting or regression, one looks for some smooth curve that ``best fits'' the data, but does not necessarily pass through any data points
  • 50. Curve fitting There are two general approaches for curve fitting: • Least squares regression Data exhibit a significant degree of scatter. The strategy is to derive a single curve that represents the general trend of the data • Interpolation Data is very precise. The strategy is to pass a curve or a series of curve through each of the points is very precise.
  • 51. General approach for curve fitting
  • 52. Engineering Aapplications of Curve fitting Technique • Trend Analysis:- Predicating values of dependent variable , may include extrapolation beyond data points or interpolation between data points In engineering, two types of applications are encountered: • Trend analysis. Predicting values of dependent variable, may include extrapolation beyond data points or interpolation between data points • Hypothesis testing. Comparing existing mathematical model with measured data
  • 54. Mathematical Background • Variance. Representation of spread by the square of the standard deviation. • Coefficient of variation. Has the utility to quantify the spread of data. 2 n 1 (y  y) S 2   i y 2 2 2   n1  y / n y S  i i y c.v.  Sy 100% y • Mean • S.D
  • 56. Linear Regression: Criteria for a “Best” Fit n n  a0  a1 xi) min ei  (yi i1 i1 e1= -e2
  • 57. Linear Regression: Criteria for a “Best” Fit n n min |ei |  | yi  a0  a1 xi | i1 i1
  • 58. Linear Regression: Criteria for a “Best” Fit n min max| ei || yi  a0  a1xi | i1
  • 59. Linear curve fitting (Straight line)? Given a set of data point (xi, f(xi )) find a curve that best captures the general trend • Where g(x) is approximation function set of data point (xi, f(xi )) find a curve that best captures the general trend • Where g(x) is approximation function Try to fit a straight line Through the data
  • 60. Linear Regression: Least Squares Fit n i n n r  i  i S   i1 2 2 i1 i1 2 ( yi  a0 a1 xi ) e  (y ,measured  y ,model)     n n i r e i1 i 0 1 i i1 2 (y  a  a x)2 min S  Yields a unique line for a given set of data.
  • 61. Linear Regression: Least Squares Fit   n n r i i 0 1 i 2 e  (y  a  a x)2 min S  i1 i1 The coefficients a0 and a1 that minimize Sr must satisfy the following conditions:  0   a1  S   a0 r  0 Sr
  • 62. Linear Regression: Determination of ao and a1 2 0 0  1 i  i i  i  o y x  a x  a x 1 0   yi  a0  a1xi  2(yi  ao  a1xi )xi  0 Sr a  2(yi  ao  a1 xi )  0 Sr a    2 0 1 i i i i y x  a x  a x a0  na0 na0  xi a1   yi 2 equations with 2 unknowns, can be solved simultaneously
  • 63. Linear Regression: Determination of ao and a1 2 2 1  i  i x   x  n  i i  xi  yi x y n a  a0  y a1 x
  • 65. Error Quantification of Linear Regression • Sum of the squares of residuals around the regression line is Sr • Total sum of the squares around the mean for the dependent variable, y, is St 2 St  ( yi  y) 2 n n 2 r  i  i1 i1 e  (yi ao a1xi ) S 
  • 66. Example • The table blew gives the temperatures T in C and Resistance R in Ω of a circuit if R=a0 + a1T • Find the values of a0 and a1 T 10 20 30 40 50 60 R 20.1 20.2 20.4 20.6 20.8 21
  • 67. Solution T=Xi R=yi 𝑿𝒊𝟐 =𝑻𝟐 Xiyi=TR g(xi)=Y 10 20.1 100 201 20.05 20 20.2 400 404 20.24 30 20.4 900 612 20.42 40 20.6 1600 824 20.61 50 20.8 2500 1040 20.80 60 21 3600 1260 20.98 𝑥𝑖=210 𝑦𝑖=123.1 𝑥𝑖2= 9100 𝑥𝑖𝑦𝑖=4341
  • 69. Least Squares Fit of a Straight Line: Example • Fit a straight line to the x and y values in the following Table: xi  28  yi  24.0 2  i x 140 i i x y 119.5 x  28  4 7 7 y  24  3.428571 xi yi xy i i i x 2 1 0.5 0.5 1 2 2.5 5 4 3 2 6 9 4 4 16 16 5 3.5 17.5 25 6 6 36 36 7 5.5 38.5 49 28 24 119.5 140
  • 70. Least Squares Fit of a Straight Line: Example 2 2 1  x ) x ( n x y n a  i i  i i   xi yi  7119.5 28 24  0.8392857 7140 282 a0  y  a1x  3.428571 0.8392857 4  0.07142857 Y = 0.07142857 + 0.8392857 x
  • 71. Least Squares Fit of a Straight Line: Example (Error Analysis) 2  i r e  2.9911 S   0.868 St  Sr r 2  St 2 y  y  22.7143 S  t  i r2 r   0.868 0.932
  • 72. Least Squares Fit of a Straight Line: Example (Error Analysis) • The standard deviation (quantifies the spread around the mean):  n 1 71 s  St 22.7143  1.9457 y •The standard error of estimate (quantifies the spread around the regression line) 7 2  2.9911  0.7735 n 2 s  Sr y /x
  • 73. • The relationship between the dependent and independent variables is linear. • However, a few types of nonlinear functions can be transformed into linear regression problems.  The exponential equation.  The power equation.  The saturation-growth-rate equation. Linearization of Nonlinear Relationships
  • 75. Linearization of Nonlinear Relationships 1. The exponential equation. ln y  ln a1  b1x y* = ao + a1 x
  • 76. Linearization of Nonlinear Relationships 2. The power equation log y  log a2  b2 logx y* = ao + a1 x*
  • 77. Linearization of Nonlinear Relationships The saturation-growth-rate equation a  x  y a 3   1  1  b3  1 3 y* = 1/y ao = 1/a3 a1 = b3/a3 x* = 1/x
  • 78. Example Fit the following Equation: y  a2 xb2 To the data in the following table: xi yi X*=log xi Y*=logyi 1 0.5 0 0.602 2 1.7 0.301 0.753 3 3.4 0.301 0.699 4 5.7 .226 0.922 5 8.7 .447 2.079 15 19.7 .534 2.141 log y  log(a2 x 2 ) b let Y*  log y, X*  log x, a0  log a2 , a1 b2 2 2 log y  log a  b log x Y *  a  a X * 0 1
  • 79. Example Su m Xi Yi X*i=Log(X) Y*i=Log(Y) X*Y* X*^2 1 0.5 0.0000 -0.3010 0.0000 0.0000 2 1.7 0.3010 0.2304 0.0694 0.0906 3 3.4 0.4771 0.5315 0.2536 0.2276 4 5.7 0.6021 0.7559 0.4551 0.3625 5 8.4 0.6990 0.9243 0.6460 0.4886 15 19.700 2.079 2.141 1.424 1.169 i i 51.4242.0792.141  1.75 51.1692.0792 n x 2 ( x ) 2 a1   a0 y  a1x  0.4282 1.75 0.41584 0.334  nx i yi xi yi  
  • 80. Linearization of Nonlinear Functions: Example log y=-0.334+1.75log x y  0.46x1.75
  • 81. Polynomial Regression • Some engineering data is poorly represented by a straight line • For these cases a curve is better suited to fit the data • The least squares method can readily be extended to fit the data to higher order polynomials
  • 83. Polynomial Regression (cont’d) • A 2nd 2nd order polynomial (quadratic) is defined by: y  a  a x  a x2 e o 1 2 • The residuals between the model and the data: e  y  a  a x  a x2 i i o 1 i 2 i • The sum of squares of the residual: 2 2 2 2 i r  i   a x  e  yi  ao  a1xi S 
  • 84. Polynomial Regression (cont’d) • A system of 3x3 equations needs to be solved to determine the coefficients of the polynomial. • The standard error & the coefficient of determination n 3 s  Sr y /x t S  Sr r 2  St               i i i  i  i  i  i  i  i  i  i a x y x x x x a   x y  x x 2  2  i i  1 0  4 3 2 3 2 n x x2 a    y 
  • 85. Polynomial Regression (cont’d) • The coefficient of determination: General: The mth-order polynomial: y  a  a x  a x2  .....  a xm e o 1 2 m • A system of (m+1)x(m+1) linear equations must be solved for determining the coefficients of the mth-order polynomial. • The standard error: s  Sr n m1 y/ x St  Sr r 2  St
  • 86. Polynomial Regression- Example Fit a second order polynomial to data: 3 x  225 4  979  i x xi yi xi2 xi3 xi4 xiyi xi yi 2 0 2.1 0 0 0 0 0 1 7.7 1 1 1 7.7 7.7 2 13.6 4 8 16 27.2 54.4 3 27.2 9 27 81 81.6 244.8 4 40.9 16 64 256 163.6 654.4 5 61.1 25 125 625 305.5 1527.5 15 152.6 55 225 979 585.6 2489 xi yi  585.6 xi 15 yi 152.6 2  i  i x  55 y  152.6  25.433 6 x  15  2.5, 6 2  i i x y  2488.8
  • 87. 2nd order polynomial Exampley  a  a x  a x2 o 1 2 xi fi 𝑥𝑖2 𝒙𝒊𝟑 𝒙𝒊𝟒 fixi 𝒇𝒊𝒙𝒊𝟐 g (x) 1 4 1 1 1 4 4 4.505 2 11 4 8 6 22 44 10.15 4 19 16 64 256 76 304 19.43 6 26 36 216 1296 156 936 26.03 8 30 64 512 4096 240 1920 29.95 𝑥=21 𝑓𝑖= 90 𝑥𝑖2= 121 𝑥𝑖3 =801 𝑥𝑖4 =5665 𝑓𝑖𝑥𝑖= 498 𝑓𝑖𝑥𝑖2 =3208
  • 88. 2nd order polynomial Example 5a0 +21a1+121a2=90 21a0+121a1+801a2=498 121a0+801a1+5665a2=3 208 a0=-1.81 ,a1=6.65 ,a2=-0.335 So the required equation is g (x)=-1.81+6.65X- 0.335𝑥2
  • 89. Exponential function x 1 2 3 4 5 y 1.5 4.5 6 8.5 11 Solution y=a𝑒𝑏𝑥 lny=lna𝑒𝑏𝑥 =lna+bx Y=a0+a1X Where Y=lny=fi, a0=a ,a1=b , X=x
  • 90. X= xi yi Y=lny 𝒙𝒊𝟐 xiyi g (x) 1 1.5 0.405 1 0.405 2.06 2 4.5 1.504 4 3.008 3.27 3 6 1.791 9 5.373 5.186 4 8.5 2.14 16 8.56 8.22 5 11 2.39 25 11.95 13.03 𝑥𝑖=15 𝑓𝑖= 8.23 𝑥𝑖2= 55 𝑓𝑖𝑥𝑖=29.296 Solution
  • 91. Solution • 5a0 +15a1 =8.23 • 15a0 + 55a1 = 29.296 ; a0= 0.2642 ;a1=0.460 6 • a= 𝑒0.2642=1.30234, b=0.4606 • Require equation g (x)=1.30238𝑒 0.4606
  • 92. Exampl • Power function: x 2 2.5 3 3.5 4 y 7 8.5 11 12.75 15 Solution: • y=a𝑥𝑏 • lny = lna + blnx • Y=a0 +a1X • Where, Y=lny, a0=lna; X=lnx; a1=b
  • 93. Solution x y lnx=X lny=Y 𝑿𝟐 XY g (x) 2 7 0.6931 1.946 0.480 1.3487 6.868 2.5 8.5 0.9163 2.140 0.8396 1.9608 8.813 3 11 1.098 2.397 1.2056 2.6319 10.806 3.5 12.75 1.252 2.545 1.5675 3.1863 12.838 4 15 1.386 2.708 1.9209 3.7532 14.904 𝑋𝑖= 5.3454 𝑓𝑖= 11.736 𝑋𝑖2 = 6.0136 𝑓𝑖𝑋𝑖= 12.8809
  • 94. Solution • 5a0+5.3454a1=11.736 • 5.3454a0+6.0136a1=12.8809 • a0=1.1521 ; a1=1.1178 • a= 𝑒𝑎0 • =𝑒1.1521=3.1648 b=a1=1.1178 Required equation=3.1648𝑥1.1178
  • 95. Polynomial Regression- Example (cont’d) xi yi ymodel ei 2 (yi-y`)2 0 2.1 2.4786 0.14332 544.42889 1 7.7 6.6986 1.00286 314.45929 2 13.6 14.64 1.08158 140.01989 3 27.2 26.303 0.80491 3.12229 4 40.9 41.687 0.61951 239.22809 5 61.1 60.793 0.09439 1272.13489 15 152.6 3.74657 2513.39333 •The standard error of estimate: 3.74657 1.12 6 3 y /x s  2513.39 r2 r   0.99925 •The coefficient of determination: r2  2513.393.74657  0.99851,