biostatistics, introduction, basic terminalogies of statistics statistics
1. Basic terminology of Statistics (Introduction)
Meanings of the word Statistics: The word Statistics is used to
give the following three meanings.
Firstly the word Statistics is used in plural sense, which means the
collection of numerical data in aggregate form relating to any field of study
or inquiry e.g. Statistics of births, deaths, prices and road side accidents
etc.
Secondly, the word Statistics is used in singular sense, which
means the methods and procedures used in the collection of data, its
presentation into various forms like graphs, tables, diagrams etc and its
interpretation.
Thirdly, the word Statistics is used in technical sense as plural of
the word ‘statistic’ by statistic we mean the numerical quantities such as
mean, median, mode, s.d etc computed from the sample data e.g. If we
select a sample of 10 students from a class consisting of 50 students and
find their average height, then this average is called statistic.
2. Characteristics of Statistics: The fundamental characteristics of Statistics are
described as follows.
(i) Statistics deals with aggregate of data. A single figure like sale, accident, birth etc.
do not form Statistics e.g. the height of a single person is not Statistics, but the
collection of heights say 100 persons will form Statistics.
(ii) Statistics deals with quantitative data. Qualitative expressions like good, poor,
young, old etc. do not form Statistics e.g. If we say that the average yield of wheat per
acre during a year was good, it is not Statistics. On the other hand, if we say that the
average yield of wheat per acre during the year 2005 was 65 mounds, this is of course
a statistical statement.
(iii) Statistics are made up of many factors e.g. production of wheat in a certain area
depends on quality of seed, soil fertility, temperature, climatic conditions etc. Similarly
prices of goods and services depends on the supply and demand conditions.
(iv) Statistics are capable of being placed in relation to each other, so that they are
comparable and homogenous e.g. the ages of husbands and wives at marriage time are
comparable and can be placed in relation to each other. The heights of persons and
their monthly income have no relation with each other and therefore can’t be
compared.
(v) Statistics must be collected in a systematic manner. So that they may give
reasonable standard of accuracy. If the figures are not collected in systematic manner,
then the results obtained from such a data would not be accurate.
(vi) Statistics must be collected with a definite object (purpose) in view. Data collected
without any purpose would be of no use.
3. Descriptive and Inferential Statistics:
Statistics as a subject may be divided into two
branches i.e. Descriptive and Inferential Statistics.
Descriptive Statistics:
This branch of Statistics deals with the methods and
procedures used in the collection of data, its presentation in various forms
such as tables, graphs, diagrams and finding averages and other measures
which would describe the data.
The purpose of descriptive Statistics is to present
information in such a way that anyone can easily make decision about the
data.
Inferential Statistics:
This branch of Statistics deals with the methods and
procedures used in making inferences(results, or decisions or conclusions)
about the population parameter on the bases of sample data. This branch of
Statistics includes the estimation of parameters and testing of hypotheses.
4. Variable:
A measureable quantity which can vary from time to time, from person to
person, from place to place or from sample to sample is called variable e.g. height,
weight, ages, prices etc.
Constant:
Any quantity which can assume only one value is called constant.
Examples of constant are 𝜋 = 22/7 = 3.14159, e = 2.71828. A constant is usually
denoted by first letters of English alphabets e.g. a, b, c etc.
Discrete Variable:
A variable which can take finite or countable no. of values is
called discrete variable e.g. no. of children in a family, no. of defective items in a
consignment, no. of Statistics teachers in a department, no. of rooms in a house
etc. In discrete variable the values are taken by jumps or by breaks e.g. the no. of
children in a family can be 0, 1, 2, 3… but cannot be 2.5 or 3.84 etc.
Continuous Variable:
A variable which can assume every possible value in the given
range (Interval) is called continuous variable. Examples of continuous variable
are height and weight of the individuals, height of mercury in the Thermometer,
speed of a car etc. The values of continuous variables vary without any gaps or
jumps e.g. the height of individuals can be 62”, 62.3”, 62.7” etc.
5. Qualitative Variable:
A variable which cannot be expressed numerically is
called qualitative variable OR in other words a variable which cannot possess
any unit of measurement is called qualitative variable e.g. eye color, marital
status, intelligence, poverty etc.
Quantitative Variable:
A variable which can be expressed numerically is called
quantitative variable OR In other words a variable which can possess any unit
of measurement is called quantitative variable e.g. height can be expressed in
inches, cm, meters etc. Weight can be expressed in kg’s, gm’s, etc. Hence
height and weight are quantitative variables.
Observation:
Any sort of numerical recording of information is called an
observation e.g. It may be physical measurement such as height, weight, agees
etc. OR answers to questions such as yes or no etc.
Population:
The aggregate or the totality of the certain elements of interest is
called population e.g. Total no. of students in a department is called
population, total no. of trees in a forest is called population, total no. of fish in
a river is called population etc.
6. Sample:
Any small part of the population selected for the purpose of certain
study is called sample e.g. If a class contains 50 students and we select say 10
students to find their average height, then 50 students constitute population
while the 10 selected students are called sample.
Statistic and Parameter:
The numerical quantities such as mean, median,
mode, S.D etc. computed from the sample data are called statistic
While the numerical quantities such as mean,
median, mode, S.D etc. computed from the population data are called
parameters e.g. if we select 10 students from a class consisting of 50 students
and find their average height, then this average is called statistic, while the
average of 50 students is called parameter.
It is important to note that parameter has a fixed
quantity, while the value of the corresponding statistic varies from sample to
sample and hence a random variable.
7. Importance of Statistics:
i. Statistics is perhaps a subject that is used by everybody. The
following functions and uses of Statistics in most diverse fields serve to indicate its
importance.
ii. Statistics assists in summarizing the larger sets of data in a form that is easily
understandable.
iii. Statistics assists in the efficient design of laboratory and field experiments as well as
survey.
iv. Statistics assists in a sound and effective planning in any field of inquiry.
v. Statistics assists in drawing general conclusions and in making predictions of how much of
things will happen under given conditions.
vi. Statistical techniques being powerful tools for analyzing numerical data, are used in almost
every branch of learning. In the biological and physical sciences, Genetics, Agronomy,
Anthropometry, Astronomy, Physics, Geology, etc. are the main areas where Statistical
techniques have been developed and are increasingly used.
vii. A business man, an industrialist and a research worker all applying statistical methods in
their work, Banks, Insurance companies and Governments all their Statistics departments.
viii. A modern administrator whether in public or private sector, learns on statistical data to
provide a factual bases for decision.
ix. A politician uses Statistics advantageously to lend support and credence to his arguments
while elucidating the problems he handles.
x. A social scientist uses statistical methods in various areas of socio-economic life of a nation.
It is sometimes said that “ a social scientist without an adequate understanding of statistics
is often like the blind man groping in a dark room for a black cat that is not there”
8. Measurement Scales:
By measurement, we usually mean the assigning of numbers,
observations or objects and scaling is a process of measuring. The four scales of
measurements are mentioned bellow.
Nominal Scale:
The classification or grouping of the observations in to mutually exclusive
qualitative categories or classes is said to constitute a nominal scale e.g. students
are classified as male and female. Number 1 and 2 may also be used to identify
these two categories. Similarly rainfall may be classified as heavy, moderate and
light. We may use numbers 1, 2 and 3 to denote the three classes of rainfall. The
numbers when they are used only to identify the categories of the given scale
carry numerical significance and there is no particular order for the grouping.
Ordinal or Ranking Scale:
It includes the characteristics of a nominal scale and in addition
has property of ordering or ranking of measurements e.g. the performance of
students or plyers is rated as excellent, good, fair or poor etc. Number 1, 2, 3, 4
etc. are also used to indicate ranks. Only relation that holds between any pair of
categories is that of “Greater than” are more preferred.
9. Interval Scale:
A measurement scale possessing a constant interval size (distance) but
not a zero point, is called an interval scale. Temperature measured on either Celsius or
the Fahrenheit scale is an outstanding example of interval scale because the same
difference exists between 200C (680F) and 300C (860F) as between 50C (410F) and
150C (590F) . It cannot be said that a temperature of 40 degrees twice as hot as a
temperature of 20 degree i.e. the ratio 40/20 has no meaning. The arithmetic
operation of addition, subtraction etc. are meaningful.
Ratio Scale:
It is special kind of an interval scale where the scale of measurement has a
true zero points as its origin. The ratio scale is used to measure weight, volume,
length, distance, money etc. The key to differentiating interval and ratio scale is that
the zero point is meaningful for ratio scale.
Examples of measurement scales:
Nominal level data Ordinal level data Interval level data Ratio- level data
Gender(male, Female) Grades (a, B, C, D, E) Temperature Age
Eye colour Position (1st, 2nd, 3rd etc.) IQ score Weight
Religion Ranking of cricket players Height
Specialization Rating (poor, good,
excellent)
Time
Nationality Socio-economic status(poor,
Middle Class, rich)
Salary
Distance
10. Survey:
The process of gathering or collecting the information is called survey.
There are two types of surveys (i) Census survey (ii) Sample survey
i. Census Survey:
The process of collecting or gathering the desired information
by studying each and every element of the population is called census
survey. The data obtained by recording the relevant information of each
and every element of the population are called census data or population
data.
ii. Sample Survey:
The process of collecting or gathering the information by
studying only small part of the population (sample) is called sample survey
and the data obtained from such a survey is called sample data.
11. Data:data is the plural of the Latin word datum which means something given in
raw form like facts and figures etc.
Types of Data: There are two types of data (i) Primary data (ii) Secondary
data.
i. Primary Data:
Data that have been collected by someone for the purpose of certain
study and have not undergone through any sort of Statistical treatment (process)
are called primary data.
e.g. The data obtained in census study are called primary data. Similarly
to know the effect fertilizer on yield of wheat, then the observations taken on each
plot are called primary data. Or to know the effect of certain drug on patient’s, the
observation taken on each patient are called primary data.
ii. Secondary Data:
The data that have already been collected by someone and have Undergone
through any sort statistical treatment (process) at least once are called secondary
data.
When Statistical methods are applied are applied on primary data, then they
Loose their original shape and become secondary data. e. g. If the data in the
different Census years are again used to measure the changes in population
growth, sex ratio, Mortality rate etc.
12. Methods of the collection of primary data:
The methods involved in the collecting of primary data
are described as follows.
1. Direct personal Investigation:
In this method, the investigator collects
the required information directly from the source concerned (i.e.
respondent). The investigator ask some direct questions on respondents
and note down his replies.
Advantages: The information collected by this method are highly
accurate, reliable and useful.
The accuracy of the result also depend on proper training of the
investigator i.e. Investigator should be fact full, keen observer and
courteous in behavior.
Disadvantages: This method is very slow, expensive, tedious and time
consuming and practically suitable for small scale and secret inquiries, but
not for extensive inquiries. The personal prejudices of the investigator may
certainly effect the inquiry.
13. 2. Indirect Personal Investigation:
Sometimes the informants feel hesitation or refuse to
give some direct answers, then the information are collected either by
putting the informants some suitable indirect questions or by interviewing
several third persons or witness who know the informants well and is
expected to give correct information. e.g. When the businessmen are
reluctant to disclose their income to income tax authorities. In this case
income tax authorities can get desired data by interviewing those persons
who are directly involved in that business like clerks, salesmen etc.
Advantages: The main advantage of this method is that, the information
collected in this way are likely to be complete, more correct and some
additional information can be obtained.
Disadvantages: Sometime the time taken by the third persons in replying
may be very long. The result is affected if the third person do not have the full
knowledge of the problem.
14. 3. Investigation Through Correspondents:
In this method the
investigator or agency appoints trained agents or correspondents with
the directives to collect the required information using their own
methods and judgement and then communicate these information to the
investigator or agency, usually in regular interval of time for further
processing e.g. newspapers, magazines etc. uses this method to get
information from their correspondents in the different fields such as
strikes, sports, politics etc.
This method covers a vast area, have better scope cheap and
relatively accurate. but its originality, uniformity and accuracy can always be
questioned.
15. 4. Investigation Through Questionnaire:
According to this method, a standard list of questions relating to a particular
problem is prepared. This list of questions is called questionnaire.
This method is used when the survey is spread over a vast area and the
informants are educated. In this method a questionnaire is sent by mail to each and
every informants they are requested to fill up the questionnaire and send it back.
Advantages: This method is less costly and less time consuming, the information can be
collected from a wide area. A reasonable standard of accuracy is expected by this
method.
Disadvantages: Most of the informants do not care to fill in the questionnaire and the
rate of return is very slow. Some of the informants returns the questionnaire incomplete
and full of errors which certainly affect the result.
5. Investigation through Questionnaire in charge of
enumerators:
In this method the information are collected by appointing trained enumerators who
go to the informants with a questionnaire and help them in recording or filling up the
relevant column in the questionnaire.
Advantages: This method is very useful for extensive inquiries e.g. census survey etc.
The reliability of the investigation depends upon the proper training of the investigator.
Disadvantages: This method is very expensive and perhaps only a government agency
can afford to use it.
16. Methods of collecting Secondary Data:
Secondary data can be collected from the following sources.
1. Official Sources: e.g. publications of Statistical division, reports of
ministries of finance, food and agriculture, planning and development etc.
2. Semi-Official Sources: e.g. publication of State Bank, WAPDA,
PIA,local bodies etc.
3. Private Sources: e.g. publication of trade associations, chambers of
commerce and industry, private commercial and financial institutions etc.
4. Research Organizations: e.g. publication of research organizations
like universities, institutes of education and research etc.