Statistical Methods

Why Statistics. Statistics is used to take the analysis of data one stage beyond what can be achieved with maps and diagrams. You can gain a primitive insight into patterns at a glance but mathematical manipulation usually gives greater precision. This allows us to discover things which might otherwise go unnoticed.

The need for justification. Justifying mathematical manipulation is vital. It is vital to be aware that statistics is an aid to analysis and no more. Too often students make statistical calculations in geographical projects without adequate justification. Before statistics is used it is essential to ask yourself two questions.

Question 1. Why am I using this technique? In the exam be absolutely clear what it is a statistical test can prove and how a statistical test can do this.

Question 2. Is the data appropriate to this particular technique? Each technique requires data to be arranged in a particular form. If they aren’t the technique cannot be used. If your data is not good in the first place the use of a complex statistical technique will not help you “ R ubbish in- Rubbish out”

Mean, Mode, Median. To be used when faced with a large amount of data For example- average temperature of a place every day for two years. It makes things far easier when we can summarise it. This is relatively easy to do and there are three common methods to achieve this.

1- Mean What most people call the average is the mean. You find it by adding all the numbers together and then divide by the total number of data values. The mean is shown by the symbol- x The mean is distorted if you have just one extreme value which can be a problem. However, it is the most commonly used as it can be used for further mathematical processing.

Find the mean of these data values- 3, 4, 4, 4, 6, 6, 9. 36 = 5.1 7 x = 5.1

2- The Mode. The mode is simply the most frequently occurring event. If we are using simple numbers then the mode is the most frequently occurring number. If we are looking at data on the nominal scale (grouped into categories) the mode is the most common category. The mode is very quick to calculate, but it cannot be used for further mathematical processing. It is not effected by extreme values.

Find the mode of this data set. 3, 4, 4, 4, 6, 9. Mode (most frequently occurring number)= 4

Find the mode of this nominal data. Mode (Most frequently occurring category)= wheat. 17 Pasture 18 Barley 29 Wheat 3 Fruit 15 Vegetables 12 Rye 10 Clover Hectares Land Use

3- The Median. The Median is the central value in a series of ranked values. If there is an even number of values, the median is the mid point between the two centrally placed values. The median is not effected by extreme values but it cannot be used for further mathematical processing.

Find the median of this data set. 3, 4, 4, 4, 6, 9. Median (central value)= 4.

Now find the median of this data set. 3, 4, 4, 6, 6, 9. Median (central value)= 5

Spread around the median and mean. The median, mean and mode all give us a summary value for a set of data. On their own, however, they give us no idea of the spread of data around the summary value, which can be misleading. For example…

I collected the following rainfall data. The mean for this data is 20mm. But that gives an untrue picture of what really happened. There is a great “deviation about the mean”. Deviation can be measured statistically as follows. 0 1993 3 1992 0 1991 0 1990 Rainfall (mm) Year 97 1994

Spread around the median: the interquartile range. The Interquartile range is a measure of the spread of the values around their median. The greater the spread the higher the interquartile range.

Method. Stage 1- Place the variables in rank order, smallest to largest. Stage 2- Find the upper quartile. This is found by taking the 25% highest values and finding the mid-point between the lowest of these and the next lowest number. Stage 3- Find the lower quartile. This is obtained by taking the 25% lowest values and finding the mid-point between the highest of these and the next highest value. Stage 4- Find the difference between the upper and lower quartiles. This is the interquartile range, a crude index of the spread of the values around the median. The higher the range the greater the spread.

Over to you. Copy out the data on the next slide Then find the interquartile range, remembering to follow all the four stages.

5 December 7 November 11 October 15 September 17 August 17 July 15 June 12 May 9 April 7 March 5 February 4 January Average temperature Month

Answer Ranked the data looks like this. 5 5 7 7 9 11 12 15 15 17 17 Lower Quartile Median Upper Quartile 6 10 15 Interquartile range: (15-6) = 9.

Spread about the mean: Standard deviation. If we want to obtain some measure of the spread of our data about its mean we calculate its standard deviation. Two sets of figures can have the same mean but very different standard deviations.

Stage 1- Tabulate the values (x) and their squares (x ² ). Add these values (∑x and ∑x ² ). Find the mean of all the values of x (x ) and square it (x ² ). Stage 3- Calculate the formula = ∑x² - x ² n Method.

= standard deviation. = the square root of. ∑ = the sum of. n = the number of values. x = the mean of the values.

Over to you. Number of vehicles passing a traffic count point. Calculate the standard deviation of the following data.

82 10 75 9 42 8 63 7 70 6 60 5 92 4 80 3 75 2 50 1 Number of vehicles. Day

Answer. 6 724 82 5 625 75 1 764 42 3 969 63 4 900 70 3 600 60 8 464 92 6 400 80 5 625 75 2 500 50 x² x

Answer ∑ X = 689 ∑ x² = 49 571. x = 689 divided by 10 = 68.9 x ² = (68.9) ² = 4747.2 = ∑x² - x ² = 49 571 – 4747.2 n 10 = 14.5

Phew!!!!!! The higher the standard deviation, the greater the spread of data around the mean. The standard deviation is the best of the measures of spread as it takes into account all of the values under consideration.

Homework. Research the following tests of significance to find out their meaning. The Mann-Whitney U test. The Chi- Squared (x²) test.

Statistical Methods

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Statistical Methods (20)

More from guest9fa52 (6)

Statistical Methods